allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.67k stars 653 forks source link

Too long task name during Hyperparameter Optimization #516

Closed 421psh closed 2 years ago

421psh commented 2 years ago

Hello, I really like your product, however I face an issue of saving models during hyperparameter optimization.

I run hyperparameter optimization for weeks, where I train hundreds of models with dozens of different hyperparameters. I use Minio as file storage. Due to many override parameters new job names become very long and uploading path becomes something like this:

"mybucket/Traffic lights classification/Train%3A General%2Fgeneral%2Fbatch_size_train=192 General%2Fgeneral%2Fmodel_name=resnet18 General%2Fimage_processing%2Faugmentation_version=2 General%2Ftrain%2Flearning_rate=0.00951 General%2Ftrain%2Floss_function=HuberLoss General%2Ftrain%2Foptimizer=yogi.1b46427f3a79406dbab354a71b410a88/models/epoch=037-val_epoch_score=0.926.ckpt".

Minio allows using only 255 for '/' separated object name segment (https://github.com/minio/minio/blob/master/docs/minio-limits.md#object-name-restrictions-on-minio). That's why I get such error An error occurred (XMinioInvalidObjectName) when calling the CompleteMultipartUpload operation: Object name contains unsupported characters. and my models are not uploaded. I have verified that there is no problem in the unsupported characters themselves, the error occurs due to the length of the task.

Maybe it makes sense to limit or change number of parameters which participate in name generation: https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379c5f82337a0/clearml/automation/optimization.py#L720 Perhaps you should make naming_function as public variable in SearchStrategy class and allow changing it in HyperParameterOptimizer class?

jkhenning commented 2 years ago

Hi @421psh,

A fix was pushed to the repo to address this issue - can you install from the master branch and verify it works for you?

421psh commented 2 years ago

@jkhenning Thank you very much for the quick issue solution. Now all models are perfectly saved.