facebookresearch / dora

Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
MIT License
262 stars 24 forks source link

Moving a running xp from one grid to another (e.g. when refactoring) cancels the XP #31

Open louismartin opened 2 years ago

louismartin commented 2 years ago

I had a grid grid_a with too many experiments running so I refactored some of its experiments in a new grid file grid_b. While running the new grid dora grid grid_b worked as expected and found the already running experiment, when I ran dora grid grid_a again it cancelled all the experiments that were now in grid_b.

It would be nice to have a way to track this scenario and only garbage collect experiments that are not linked to a grid. And also it would be nice to ask the user for confirmation when cancelling experiments.

adefossez commented 2 years ago

Yep, my take was that the marginal value of garbage collection compared to the complexity it adds to the code wasn't worth it. Keep in mind there is no real DB just files over NFS. If you have an easy and reasonably reliable solution I would be happy to take it.