TUM-DAML / seml

SEML: Slurm Experiment Management Library
Other
165 stars 29 forks source link

Change slurm config inheritance to default -> template -> experiment #109

Closed heborras closed 1 year ago

heborras commented 1 year ago

Reference issue

No issue has been created yet.

What does this implement/fix?

I've recently started working with the sbatch_options_template option and noticed that the way it inherits configuration from the default settings is strange. Currently, when defining a sbatch_options_template the add command will first merge the default and the experiment config and then pass along options from the template, if they were not already defined before. See this code on the main branch: https://github.com/TUM-DAML/seml/blob/e340dc4de0c472839fdda811e8b90d69fd6e54e0/seml/add.py#L142-L152 In practice this leads to the following inheritance scheme: template -> default -> experiment Meaning that the default configuration will over-ride any template options. This makes the feature not that useful in practice, since if one wants to use templates one will need to setup the default config to contain almost no options and redefine a bunch of stuff for each template.

Instead I would expect the templates to follow the following inheritance: default -> template -> experiment This is done in this PR.

Comments, feedback or insights into why this is currently done the way it is are more than welcome.

n-gao commented 1 year ago

Thanks a lot for the PR! This indeed seems like undesired behavior. Would it be possible to write a unit test for this such we can be sure that the behavior is as intended and doesn't change in future iterations?

heborras commented 1 year ago

Hm, yeah that would be nice indeed. However, I'm not sure how to test the add_config_file function without an active mongodb connection. Because without a working mongodb configuration the function will likely crash with a connection error: https://github.com/TUM-DAML/seml/blob/e340dc4de0c472839fdda811e8b90d69fd6e54e0/seml/add.py#L116

n-gao commented 1 year ago

I see. We should potentially reduce large functions in the first place. So, a solution would be to move the configuration merging into a separate function. At a later time, this should be considered for more parts such that we can get larger test coverage.

heborras commented 1 year ago

Okay, I restructured the code as suggested and added some tests to check that the inheritance is correct. I've also found a bug where the read_config function would crash if either the slurm or the sbatch_options key were empty in the .yaml config file. The bug is now fixed in 84d000866808563117ae1eb6624eb5d9cc49c65b in such a way that if these keyse were empty in the config file they are converted to empty dictionaries, this is also covered by the tests.

n-gao commented 1 year ago

Except for the small code consistencies, this looks good to me! Thanks a lot for your great help :)

heborras commented 1 year ago

Nice, I've removed the unnecessary parentheses.