Multiple runs at once & experiment tracking

Currently it's not possible to run multiple training experiments at once, because of how the scripts are set up. This is probably a relatively easy 1-2 day job.

A bigger pain point here is experiment tracking/management. Wandb is decent but very expensive if we want to use it for real. Even with wandb we need to add metadata to keep track of things like which dataset was used and which transformation steps were applied to it.