Open ockaro opened 1 year ago
Hi There!
Thanks again for the detailed write-up. Would you mind testing if the following fix works? It seems like the clearml config file is not mounted inside the necessary containers. Make sure your Azure credentials are added in this config file :)
So you'd add:
volumes:
- $HOME/clearml.conf:/root/clearml.conf
to here: https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L77 and here: https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L107
If you can confirm this is working, we can make a PR and get this issue sorted out. Thanks a lot for your patience and cooperation!!
Hi @thepycoder , thanks for your answer and sorry for my late reply. At least I managed to try your recommendations today and had the following findings on my local windows machine: (btw I am using the docker-compose-triton.yml not the GPU version)
msg="The \"HOME\" variable is not set. Defaulting to a blank string."
right after calling docker-compose. Setting the HOME environment variable did not work so I added it to the .env file which is passed in the docker-compose and got rid of the error. clearml-serving-triton | E0217 10:21:25.908301 34 model_repository_manager.cc:2064] Poll failed for model directory 'test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory
clearml-serving-triton | Info: syncing models from main serving service
clearml-serving-triton | Updating local model folder: /models
clearml-serving-triton | 2023-02-17 10:21:26,079 - clearml.storage - ERROR - Azure blob storage driver not found. Please install driver using: 'pip install clearml[azure]' or pip install '"azure.storage.blob>=12.0.0"'
clearml-serving-triton | Error retrieving model ID 9075dbebef6d4467801da808a6e39570 []
clearml-serving-triton | Info: Models updated from main serving service
clearml-serving-triton | reporting metrics: relative time 123 sec
clearml-serving-inference | Instance [3cf8c573a03e4341aa6f422465d5521b, pid=8]: New configuration updated
clearml-serving-inference | ClearML results page: https://app.clear.ml/projects/c8794acd9c594f4e9f9a9a55b9b76632/experiments/3cf8c573a03e4341aa6f422465d5521b/output/log
clearml-serving-inference | ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoringclearml-serving-inference
clearml-serving-inference | ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
So it seems like the azure blob storage driver is not set up properly in the docker container? In the environment where I call docker-compose the requirement is already satisfied.
Hey @ockaro!
Thanks for checking back in!
CLEARML_EXTRA_PYTHON_PACKAGES="azure-storage-blob"
This should install the blob storage for you. If this works, we'll add it to the default requirements :)Hi @thepycoder, thanks again for your reply.
Do you need any further information?
@ockaro Awesome, thanks a lot for your patience here! We don't need anything else and are working to make the process more painless in the future. Thank you so much for your contributions!
Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different scenarios:
test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory
.Side note: The same problem occurs hosting the containers on windows and on linux. All azure credentials are succesfully set up as envioronment variables in 'clearml-serving-inference', 'clearml-serving-triton' and 'clearml-serving-statistics' containers.