instead of the dataset defined by yaml files containing >50k rows each, load it from merged parquet files, stored at /home/joosep/ml-tau-en-reg/data/20240402_full_stats_merged/ (~2GB total)
revert dataset yaml files to their previous state (maybe they are still used by the ntuplizer etc), can be simplified later
instead of a separate builder step, the model inference on test samples is done directly after the training, into the output directory
all models can be trained and evaluated on slurm ./enreg/scripts/submit-pytorch-gpu-all.sh
use a single unified dataset class for all models for consistency
The evaluation output files contain only the prediction outputs and targets, they can be combined with the input data as follows (the order of rows is the same):
/home/joosep/ml-tau-en-reg/data/20240402_full_stats_merged/
(~2GB total)./enreg/scripts/submit-pytorch-gpu-all.sh
The evaluation output files contain only the prediction outputs and targets, they can be combined with the input data as follows (the order of rows is the same):