facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.82k stars 635 forks source link

[Feature Request] Conditionally disable the creation of folders in output/multirun via the CLI #1937

Open armandgurgu23 opened 2 years ago

armandgurgu23 commented 2 years ago

πŸš€ Feature Request

Add a CLI flag that prevents the creation of timestamped folder structures inside of outputs/ and multirun/ when executing a script decorated using @hydra.main().

Motivation

Prior to this, I would like to thank the developers of Hydra for creating such a useful and versatile tool :). I'm relatively new to Hydra and I have been adapting this tool as part of my ML experimentation lifecycle, due to its useful automatic creation of timestamped directories during code execution πŸ™‚ .

However I have experienced a painpoint with regards to the feature described above when using Hydra for long term experimentation (> 1 month). I have found that overtime you can have a buildup of directories in outputs/ and multirun/ respectively (the default folders where Hydra catalogues your main script execution). These directories may not contain useful configuration dumps. (ie: you are making modifications/troubleshooting your main experiment script, which is decorated with Hydra)

Pitch

One quality of life improvement would be to add the ability to conditionally disable the creation of timestamped folders inside of outputs/ and multirun/. I believe the best user experience for this feature would be to build a flag for this behaviour and be able to pass the value of this flag when executing the main script (decorated with hydra.main()) through the command line.

I believe the feature request above would be valuable for experiment organization, since it would prevent creating timestamped folders during script troubleshooting + experiment development.

jieru-hu commented 2 years ago

hi @armandgurgu23 thanks for the kind words, glad to hear Hydra helps!

The timestamped output dir could be easily configured https://hydra.cc/docs/configure_hydra/workdir/

we will look into support disabling the creation all together, but at the same time you can easily override the output dir for your experimental runs to a tmp folder for example

python myapp.py hydra.run.dir=/tmp

Jasha10 commented 2 years ago

Note that hydra.run.dir does not apply in multirun mode; hydra.sweep.dir must be used in multirun mode.

Let me follow up on Jieru's comment with an example using config groups: Here is the idea:

$ python app.py save=timestamp  # use timestamped output folders
$ python app.py save=tmp        # use a directory called output/scratch
$ # same for multirun:
$ python app.py -m save=timestamp
$ python app.py -m save=tmp

This can be achieved with the following config files:

Here is conf/config.yaml:

defaults:
  - save: tmp
  - _self_

Here is conf/save/tmp.yaml:

# @package _global_
hydra:
  run:
    dir: output/scratch
  sweep:
    dir: multirun/scratch

Here is conf/save/timestamp.yaml:

# empty

If you repeatedly use the save=tmp setting, the contents of output/scratch from previous app runs will be overwritten.

Given that the file conf/save/timestamp.yaml is empty, the following are equivalent:

$ python app.py save=timestamp
$ python app.py save=null

Given that the setting save: tmp appears in the defaults list of the primary config file, the following are equivalent:

$ python app.py
$ python app.py save=tmp

This is to say that the scratch directory output/scratch is used by default. This could be changed by instead using the setting save: timestamp in the defaults list of the primary config.

Jasha10 commented 2 years ago

we will look into support disabling the creation all together

Sounds good. I think there will be some interaction with the new hydra.job.chdir setting: if chdir=True then we need to call os.chdir(output_dir), so in that case we can't disable the creation of the output directory.

jieru-hu commented 2 years ago

keeping this open for now to see if there's more interests on disabling the working dir creations