Dataset loading for eval

Addresses the issue of data organization during model eval (discussed in Slack) via two changes:

Stratifies the train-test split when generating input JSONs (generate_dataset_in_json.py) such that the training and validation sets both have the same distribution of detector types. This avoids the case where the training and validation datasets have fundamentally different structure (e.g. significantly more Rayonix samples in one set than the other)
Randomizes the order in which all events across all experiments are yielded by the IPCDistributedSegmentedDataset entry generator (i.e. it is no longer the case that all events from an experiment run appear consecutively). This avoids the situation where a sampler only samples events from the same experiment/run, and also introduces more diversity in how many events from the early/middle/late stages of a run appear in a dataset sample

carbonscott / maxie