allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.69k stars 655 forks source link

clearml.storage - ERROR - Exception encountered while uploading #1015

Open senovr opened 1 year ago

senovr commented 1 year ago

Describe the bug

I am running sklearn's cross-validate method, to iterate through different models, and generate summary scores table. From time to time, I am getting the following message in console:

- clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /project/vanilla models training.bacef4ffc6534576b31e590d6304cd3f/artifacts/notebook/01_training.ipynb (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>openresty/1.15.8.1</center>
</body>
</html>

At the same time, artifacts ( such as matplotlib plots, summary tables) are getting uploaded into the server.

Can you please give me a hint what this error message indicates and how I can get rid of it?

UPD: during one experiment, process seems to get stuck due to the error above. It continuously shows an exception after training one of the models without going any further.

To reproduce

Exact steps to reproduce the bug. Provide example code if possible.

Expected behaviour

What is the expected behaviour? What should've happened but didn't?

Environment

jkhenning commented 1 year ago

Hi @senovr,

Did you deploy openresty in front of the ClearML server? This seems like a network error from openresty limiting the size of uploaded objects (just like nginx does)