Open idantene opened 1 year ago
One thought that came to mind, is that when setting e.g. agent.package_manager.system_site_packages = true
, then this does not apply to virtual environment created with poetry
to begin-with. It should update those too, probably.
Hi @idantene,
Following up here...
then this does not apply to virtual environment created with poetry to begin-with
poetry needs to be installed on the system packages, and by default poetry will Not inherit from the system packages, it creates a new venv. Also could you provide the full log? I'm not sure where exactly are you getting the missing module
I understand @jkhenning, and as mentioned, it is installed on the system packages.
The above is the full log, unfortunately (I only trimmed the gazillion packages installed with poetry).
It pops up when clearml-agent is trying to run poetry run python -u train.py
(right before the list of installed packages).
@idantene can you get the machine log itself? You should be able to from the AWS cloud console
Sure, there's nothing of interest there though. It installs the packages, poetry, etc, and then runs the agent to listen to the queue.
Is there anything in particular you're after?
I would really like to see what's installed and where
@jkhenning The AWS autoscaler uses an Ubuntu 22.04 AMI, and then it runs the following on new instances (the content of extra_vm_bash_script
; comments added for ease of reading):
# Some dependencies we need for our libraries
apt-get install -y gfortran libopenblas-dev liblapack-dev libpq-dev python-is-python3 python3-pip python3-dev proj-bin libgraphviz-dev graphviz graphviz-dev libgdal-dev
# Ensuring Python 3.7, 3.8, 3.9, and 3.10 (comes with Ubuntu 22.04) are available
apt-get install software-properties-common -y
add-apt-repository ppa:deadsnakes/ppa -y
apt update
apt install python3.7 python3.8 python3.9 python3.7-distutils python3.8-distutils python3.9-distutils python3.10-distutils python3.7-dev python3.8-dev python3.9-dev python3.10-dev -y
# We use the https+git authorization rather than the ssh+git
git config --system credential.helper \"store --file /root/.git-credentials\"
# Ensure a virtualenv exists for each python; needed because `clearml-agent` calls `pythonX.Y -m virtualenv` (see clearml_agent/helper/package/pip_api/venv.py#L68)
python3.7 -m pip install virtualenv
python3.8 -m pip install virtualenv
python3.9 -m pip install virtualenv
python3.10 -m pip install virtualenv
# Install poetry (as per official docs)
curl -sSL https://install.python-poetry.org | python3 -
export PATH=\"/root/.local/bin:${PATH}\"
# Export environment variables for easy AWS access
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
@idantene is this the contents of the extra_vm_bash_script
or the actual installation log of this script being executed?
This is the contents of extra_vm_bash_script
as mentioned above. This is the only preparations we do on our end, everything else is taken care of by clearml-agent
.
OK, but can you get the actual system log from the AWS machine? it should include the execution output of this script (as well as other stuff)
I will have to find a moment to redo the autoscaler so that it uses poetry (we've since reverted back to pip for the meantime), but when I did look at it - everything went fine (no errors or warnings). It installed clearml-agent
v1.5.1, and ran these additional steps.
I'll get back to you with a raw system log at a later time.
See detailed Slack thread.
The tldr is that, at least when running as an autoscaler, the clearml-agent fails with poetry as the package manager with the following error:
Now,
poetry
is installed on the machine (I've tried various approaches, see the thread) - all requirements are installed via poetry. The environment also correctly identifies the poetry CLI as existing in the remote machine, as it tries to runpoetry run python -u train.py
:From what I gather, all the various agent settings apply to the top-level agent environment (
/clearml_agent_venv/
), but when this agent daemon creates the environment for the individual tasks, thepoetry
module is missing (even if, e.g., it is mentioned inclearml.conf
under theagent.package_manager.priority_packages
.