allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.67k stars 654 forks source link

[Feature request | clearml-agent] allow to stop a specific agent from the command-line option by incorporating arguments to `--stop` #448

Open talajasi7 opened 3 years ago

talajasi7 commented 3 years ago

For now, the only way I know to kill a particular agent from ClearML API is through the arguments that define it (those that are entered when spinning it up). In addition to this feature, I think the preferable way to stop an agent would be via command-line option like: clearml-agent daemon --stop AGENT_PID It would make a good combination with the list of agent IDs provided by the clearml-agent list command.

bmartinn commented 3 years ago

Thanks @talajasi7 this is a great idea, and we will definitely add it. For future reference, currently you can stop a specific instance of the agent if you use the exact same command and add --stop at the end. Example, launching two agents (one per gpu), then stopping agent on gpu "1"

clearml-agent daemon --queue default --gpus 0
clearml-agent daemon --queue default --gpus 1

clearml-agent daemon --queue default --gpus 1 --stop
ColdTeapot273K commented 9 months ago

Thanks @talajasi7 this is a great idea, and we will definitely add it. For future reference, currently you can stop a specific instance of the agent if you use the exact same command and add --stop at the end. Example, launching two agents (one per gpu), then stopping agent on gpu "1"

clearml-agent daemon --queue default --gpus 0
clearml-agent daemon --queue default --gpus 1

clearml-agent daemon --queue default --gpus 1 --stop

Such a command consistently stops the wrong agents from wrong queues.

ainoam commented 9 months ago

@ColdTeapot273K Perhaps best to open it as an issue for clearml-agent?

jkhenning commented 9 months ago

@ColdTeapot273K Which agent version are you using? I just tried the same sequence of commands with the latest agent version (v1.7.0) and it consistently stops the correct agent