Closed terbed closed 7 months ago
Hi @terbed, Can you share the full execution log? Which agent version did you use when running the experiment remotely?
Hi @jkhenning, CLEARML-AGENT version 1.8.0 Self hosted server: WebApp: 1.15.0-472 • Server: 1.15.0-472 • API: 2.29
Unfortunately, I have only another log where I tried to set up python3.11, which did not work out well: task_a6e71e6fdd704d68a45b4f5ec6eb6ad8.log
I've lost the log from the mentioned issue, as I made a workaround by adding this to the config:
# optional shell script to run in docker when started before the experiment is started
extra_docker_shell_script: ["pip install torchvision", "pip install lightning[pytorch-extra]"]
which is far from ideal.
Some updates on the issue:
__init__.py
file was missing in my src folder. Adding this file clearml successfully recognized the torchvision
package and some additional missing packages.lightning[pytorch-extra]
package.The full log: task_9f8e874ec6eb49ada468f18be0059d3f.log
In the other issue page I got the answer: https://github.com/allegroai/clearml/issues/1245#issuecomment-2056962265
Hello,
I cannot reproduce experiments remotely, because the environment is not constructed correctly. The recognized packages:
Actual packages in the environment:
So the remotely reproduced training fails because torchvision is not installed in the env: