Open AkideLiu opened 2 years ago
Hi @AkideLiu,
In the clearml agent, the main idea is to have a task that was already executed be exactly reproducible when cloned - that's the reason the agent attempts to install the exact same requirements. This is usually not a problem when you're running the agent in docker mode since the cloned task will likely be executed using the same docker image and thus will be able to use the exact same requirement - it's only ever a possible issue when using the venv mode which is inherently less stable (i.e. depends on the actual machine/OS/environment the agent is running on). You can always override it by clearing the requirements from the cloned task in the UI in which case the agent will try to install requirements.txt
from the git repository.
Implementing the behavior you are looking for is actually new feature request (first try to install the "stored full requirements", then if that fails try to install the "original" requirements), and we will put it into our task list 🙂
Hi @AkideLiu,
In the clearml agent, the main idea is to have a task that was already executed be exactly reproducible when cloned - that's the reason the agent attempts to install the exact same requirements. This is usually not a problem when you're running the agent in docker mode since the cloned task will likely be executed using the same docker image and thus will be able to use the exact same requirement - it's only ever a possible issue when using the venv mode which is inherently less stable (i.e. depends on the actual machine/OS/environment the agent is running on). You can always override it by clearing the requirements from the cloned task in the UI in which case the agent will try to install
requirements.txt
from the git repository.Implementing the behavior you are looking for is actually new feature request (first try to install the "stored full requirements", then if that fails try to install the "original" requirements), and we will put it into our task list 🙂
Hi @jkhenning , thanks for your clarification, could you please let me know how to remove cached requirements from web UI?
Hopefully, you can consider implementing some kind of feature to retry the failed dependency installation. Because for some use cases it's might hard or even not impossible to leverage advanced containerization techniques. For example, the slurm cluster might not native support docker mode.
@AkideLiu Please note that as of ClearML Server v1.14.0, package cache is available in the ClearML UI.
Thank you for helping us making ClearML better!
Describe the bug
To reproduce
The scenario is like that, suppose we have two workers (A, B) in the same queue on different farms with different clearml configurations. Additionally, the storage is not shared between workers A and B.
If we create task X, X is working properly on worker A; when we create a cloned task of X called Y maybe run on worker B, worker B's agent tries to install the specific cached requirements.txt from worker A , causing job failure because this file does not exist on worker B.
Expected behaviour
How to modify the agent configuration which leads the agents not to use cached requirements txt from previous jobs?
Environment
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.