NVIDIA / NeMo-Run

A tool to configure, launch and manage your machine learning experiments.
Apache License 2.0
78 stars 20 forks source link

Can we remove the "title" parameter from experiment and only keep "id"? #9

Open Kipok opened 3 months ago

Kipok commented 3 months ago

Right now, the experiment accepts both title and id parameters which allows to group multiple jobs under the same "title". I would suggest changing this and only keep a single "id" parameter to simplify some workflows as well as make it easier to maintain. Here are some points why I think it will be more convenient:

ShriyaPalsamudram commented 3 months ago

I'm worried that dropping the unique id from the experiment could lead to overwriting old logs or the old state. If say we allow using the same directory without id for launching 2 different runs, and each of them use different containers/code - then there's no good way to find that out later.

For checkpoint resume, just using a separate shared folder for checkpoints outside of the experiment directory can achieve that.

Kipok commented 3 months ago

Right, but then we wouldn't be able to track the artifacts using nemo run. E.g. if I want to add some functionality to download the results or maybe to initialize another experiment with the current one, it's going to be harder to do than if I keep everything inside the workspace folder.

Another point to add is that if I'm running 2 experiments with the same name and it's indeed an error, I'd rather know about that and not have nemo-run silently create a new folder for me. Maybe I didn't actually mean to run that experiment? If you think that default behavior should be to error out, I'm fine with that as long as we add some parameter to allow for override, so that we can use it in our workflows. Would that be a good approach?

Kipok commented 3 months ago

Another option to consider is to make exp name optional, so that when it's not supplied something random like the current time string is used. This way we don't put restriction on users to always create a unique name if they just want to use nemo.run to launch jobs and don't care where exactly where metadata is stored. While also reusing the folder if experiment name is specified

marcromeyn commented 3 months ago

I think I would prefer @Kipok's suggestion to make the experiment-name optional (NeMo does the same). The experiment-dir should be self-contained and therefore people should be able to move it, so we should also have a Experiment.from_path classmethod.