allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.59k stars 646 forks source link

Add Progress Bar for downloading (and maybe uploading) artifacts #1307

Open EladDvash opened 1 month ago

EladDvash commented 1 month ago

Proposal Summary

Add a progress bar to the downloading of artifacts/models/large-files from all remote sources (S3/Azure buckets/GCP buckets/network-drives), consider doing so also for uploading (though that might be an issue when they are non blocking because of terminal spam)

Motivation

When downloading models from previous experiments as artifacts for fine-tuning or as pretrained models it would be great to know how much progress has been made in the download as they can take several minutes to get, currently it just says it started downloading and you have no idea when it might finish or if it's stalled

eugen-ajechiloae-clearml commented 1 month ago

Hi @EladDvash ! Downloads/uploads made through clearml do have a progress bar that relies on tqdm. You should make sure tqdm is installed on your environment. Also, by default we do not create a progress bar for files/objects smaller than 5MB (configurable here: https://clear.ml/docs/latest/docs/configs/clearml_conf/#sdkstoragelog)

EladDvash commented 1 month ago

Thanks, I see that clearml supports a progress bar but sadly not for GCP buckets. In the "download_object" function for "_GoogleCloudStorageDriver" in storage/helper.py there is an argument for a callback but it isn't used and the "download_object_as_stream" function returns a NotImplementedError which are the 2 places where the progressbar callback is used in other places