allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.7k stars 657 forks source link

When attempting to send a request with event size exceeding max_req_size, several subsequent events are also lost. #1316

Open Legyan opened 3 months ago

Legyan commented 3 months ago

Describe the bug

The issue occurs here: https://github.com/allegroai/clearml/blob/master/clearml/backend_api/session/session.py#L561 If any line in req_data exceeds the max_req_size, a MaxRequestSizeError is raised, and the following lines are not sent. Moreover, the error message from the exception does not appear in the logs because raise_on_errors=False is set here: https://github.com/allegroai/clearml/blob/master/clearml/backend_interface/metrics/interface.py#L244

To reproduce

Simultaneously send several plots, one of which exceeds the max_req_size.

Expected behaviour

The ClearML SDK should not lose events that do not exceed the max_req_size, and it should log the loss of events that are larger than max_req_size.

Environment

ainoam commented 3 months ago

Thanks for reporting @Legyan - We'll try to address in a near release.