allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
379 stars 132 forks source link

Disabling saving of models to local clearml ie. /opt/clearml/data/fileserver #103

Open okyspace opened 2 years ago

okyspace commented 2 years ago

Issue

ClearML server crashed and it was caused by ClearML uploading the models into ClearML server (by default). Is it possible to have an overriding config so clients can never upload to clearml server itself as default?

Recommended in slack clearml-community

To disable file server, simply comment out the fileserver service in docker-compose file.

Tested

  1. comment off fileserver as below. Got an error ERROR: Service 'apiserver' depends on service 'fileserver' which is undefined.
  # fileserver:
  #   networks:
  #     - backend
  #     - frontend
  #   command:
  #   - fileserver
  #   container_name: clearml-fileserver
  #   image: allegroai/clearml:latest
  #   restart: unless-stopped
  #   volumes:
  #   - /opt/clearml/logs:/var/log/clearml
  #   - /opt/clearml/data/fileserver:/mnt/fileserver
  #   - /opt/clearml/config:/opt/clearml/config
  #   ports:
  #   - "8081:8081"
  1. Further comment off fileserver in apiserver.depends_on. image ClearML basically cannot start. These are the errors found.

    • clearml-agent-services | clearml_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the ClearML API server http://apiserver:8008 ?
    • clearml-webserver excited with code 1 as host not found in upstream "fileserver"
  2. We also tested by starting the clearml server as per normal and kill the fileserver. With this, clearml continue to work and also disable the uploading of models to clearml server.

May I ask how to disable the fileserver correctly?

jkhenning commented 2 years ago

Hi @okyspace,

Sorry, I must have missed the last message to the thread in Slack 🙁

As for the best way to disable it, after some more thought, it just occurred to me the best way to do that might be to simply remove the published 8081 port - this way nobody can access the fileserver from outside the docker network (i.e. clients trying to reach http://hostname:8081 won't be able to reach it), but internal components can still reach it (like the services agent and the webserver reverse proxy) - so you can simply revert the commenting out of the fileserver, and just comment out the ports section.

Keep in mind, because of the reverse proxy, the fileserver will still be accessible using http://hostname:8080/files. Changing that (or solving the errors you've encountered) will basically require patching the reverse proxy configuration which might be a bit complicated.

okyspace commented 2 years ago

Hi @jkhenning, thanks.

I have tried disable the publishing of port 8081 by comment off it in docker-compose file. It did disable the writing of models to /opt/clearml/data/fileserver, which is what I need.

However, I am also using clearml-datasets. When I did a simple test, with the codes below, it cannot connect. I have appended the error printscreen below. When I reverted the docker-compose with 8081 published, it works ok. Any other way out?

from clearml import Dataset myDataset = Dataset.create(dataset_project=DATASET_PROJECT, dataset_name=DATASET_NAME)

Error image

jkhenning commented 2 years ago

That's because ClearML Datasets upload data as well, and by default it's to the fileserver as well. To change that, you can configure another Storage in your clearml.conf file

okyspace commented 2 years ago

@jkhenning, just to clarify, you mean to change this fileserver to point to another storage? api { files_server: http://localhost:8081

If yes, the issue we faced is that not all developers may remember to change fileserver to the external storage, or added output_uri in Task.init() to upload to external storage, thus it causes our clearml server VM storage used up and crashed it. That's why we are seeking advice on how to disable uploading to fileserver.

Is there any way that we could ensure developers change fileserver or added output_uri, e.g.. some checks in clearml Task before proceed?

jkhenning commented 2 years ago

Well, it seems to me you have what you need - once you disable the 8081 port mapping the server, the users will have to change their setting in order to be able work, won't they?

fadishaar84 commented 1 year ago

@okyspace did you manage to solve it? I'm facing the same issue after commenting port 8081 in docker-compose file, any solution?