allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
364 stars 132 forks source link

Cannot delete task. #178

Closed sav116 closed 1 year ago

sav116 commented 1 year ago

When I try to delete task, I get:

APIError: code 500/100: General data error (TransportError(503, 'search_phase_execution_exception', None)).

Versions: ClearML server: "5.5.1" (value from helm chart) client: "1.9.1" k8s: "1.24.6"

My code:

client = APIClient()
tasks = client.tasks.get_all()
for task in tasks:
    if task.name == 'some_task_name':
    client.tasks.delete(task=task.id, force=True)

My full APIError Traceback:

APIError                                  Traceback (most recent call last)
Cell In[19], line 1
----> 1 client.tasks.delete(task='f0537c22c41949e5952c8fdd202e0118', force=True)

File ~/.local/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py:378, in make_action.<locals>.new_func(self, *args, **kwargs)
    376 @wrap
    377 def new_func(self, *args, **kwargs):
--> 378     return Response(self.session.send(request_cls(*args, **kwargs)))

File ~/.local/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py:122, in StrictSession.send(self, request, *args, **kwargs)
    120 result = super(StrictSession, self).send(request, *args, **kwargs)
    121 if not result.ok():
--> 122     raise APIError(result)
    123 if not result.response:
    124     raise APIError(result, extra_info="Invalid response")

APIError: APIError: code 500/100: General data error (TransportError(503, 'search_phase_execution_exception', None))
evg-allegro commented 1 year ago

Hi @sav116 , what version of clearml do you use? Does it happen on deleting any task or only this one? Can you please share the apiserver logs? sudo docker logs clearml-apiserver > api.log 2>&1

sav116 commented 1 year ago

@evg-allegro, unfortunately there are no logs. I use these versions: WebApp: 1.9.2-317 • Server: 1.9.2-317 • API: 2.23 The problem was solved by restarting the pod with apiserver and elastic. This action was prompted by the problem that the graphs in the tab PLOTS were no longer displayed and errors related to elastic got into the server logs during deletion tasks/experiments too.