allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
381 stars 132 forks source link

Deployed `Server` doesn't have a worker for the `services` queue #140

Open gaspardc-met opened 2 years ago

gaspardc-met commented 2 years ago

Hello clearml team, First: thank you for the great work, and for open-sourcing most of it !

I have a question about the workers/queues on a self-hosted server.

My setup:

Now the pipeline is pending on the Server UI, and when I go to Workers & Queues I see that I have an idle worker k8s-agent set to queue default and no worker for queue services

My point being, if the services queue is important, why doesn't the Server/Agent set a worker for it by default ? Why isn't possible to move the job from the services to the default queue successfully in the UI ? Can I force the pipeline controller to run with the default queue along with the steps ? I thought it would be possible using pipeline_execution_queue="default" in the decorator, but doesn't seem to work on my side.

Thanks for your feedback

jkhenning commented 2 years ago

@gaspard-met you're quite right - we're planning to add this to one of the next versions, it basically requires some changes to the default docker-compose and some startup automation to support it 🙂

hotshotdragon commented 1 year ago

Hi @jkhenning , I am facing the same issue, but I am not self hosting the server. Console is stuck at Launching step: step_one Parameters: {'kwargs/pickle_url': 'https://github.com/***/data.pkl'} Configurations: {} Overrides: {}

when checking the queue, it is showing as pending.

jkhenning commented 1 year ago

Hi @hotshotdragon,

If you are not hosting the server, which server is it?