Kipok / NeMo-Skills

A pipeline to improve skills of large language models
https://kipok.github.io/NeMo-Skills/
Apache License 2.0
185 stars 41 forks source link

Display progress in summarize_results download #147

Closed Kipok closed 1 week ago

Kipok commented 1 month ago

When downloading large set of metrics (e.g. majority@256), summarize_results script might work for a long time without any progress reported. Great to figure out how to print the underlying ssh logs instead or have another way to display a progress bar

shtoshni commented 2 weeks ago

BTW another issue right now is that the download command writes a tar ball, where it may not have write permission.

https://github.com/Kipok/NeMo-Skills/blob/fc33865eea3ed93ffd7e0301a1438f5fca8b1eab/nemo_skills/pipeline/utils.py#L252

Do you think a hardcoded directory, say "/workspace" might be a place where we can always write?

Kipok commented 2 weeks ago

Well, it's a bit tricky, since we don't force users to have /workspace mount in their configs. How about instead we add a new optional argument --remote_tar_path which can be used to specify a location on cluster/locally where to write the tar which would override the default one if present?

shtoshni commented 2 weeks ago

Cool! Let me work with that for now.