Automatically running a folder of experiment scripts to completion and reporting on progress along the way

This one is something I don't have ready to go, but it exists in an ugly format right now. It will need a lot of work before its' ready. But the idea is:

Have a folder filled with experiment scripts.
Use a unique 'query term' that identifies the experiment subset you want to run.
All experiment scripts are found and queried to be run on however many resources you define (i.e. 2 GPUs). Jobs will be run and a progress bar will show current epoch and elapsed time for running jobs. As well as the status, running, done, not started, waiting to resume etc.
The last piece of this puzzle is to have a simple system that enables external parties to change how many GPUs you are experimenting over. As a result if someone needs a GPU and you are using 5, they can change your allocation to 4, which will kill one of your jobs, to be continued when resources are available, and provide one to the new user (which I implemented for myself so I wouldn't get in Elliot's way last year).

I'll try to have the above done by next Friday. But this one will need a lot of work. I believe it's important to do this. Especially point 4. so we won't get in each other's way.

BayesWatch / pytorch-experiments-template

Automatically running a folder of experiment scripts to completion and reporting on progress along the way #17