Hello! In #297 the information you provided around the validation energy MAEs was very helpful. I was wondering if you could help clarify one thing:
I was wondering if it is possible to recreate the HDF5 data you have provided as your train/val split. The reason I want to do this is to retrieve the mp-ids for each batch - when I looked into this, it seems difficult because:
In the HDF5 files, there is no reference to which .extxyz file it came from
The reason I would like to recreate your dataset exactly is because we have computed some dispersion corrections on the original MPTraj dataset, but I would also like to use your exact data split to make the comparison between the models fairer. Do you see a way for this to be possible?
If not, do you think that this will matter, or have you observed low variance when using different randomly held out splits of MPTraj? I couldn't quite tell from your pre-processing code if you are holding out entire trajectories from MPtraj for your validation set also, or if you consider each trajectory point to be independent from a dataset split perspective.
Hello! In #297 the information you provided around the validation energy MAEs was very helpful. I was wondering if you could help clarify one thing:
I was wondering if it is possible to recreate the HDF5 data you have provided as your train/val split. The reason I want to do this is to retrieve the mp-ids for each batch - when I looked into this, it seems difficult because:
The reason I would like to recreate your dataset exactly is because we have computed some dispersion corrections on the original MPTraj dataset, but I would also like to use your exact data split to make the comparison between the models fairer. Do you see a way for this to be possible?
If not, do you think that this will matter, or have you observed low variance when using different randomly held out splits of MPTraj? I couldn't quite tell from your pre-processing code if you are holding out entire trajectories from MPtraj for your validation set also, or if you consider each trajectory point to be independent from a dataset split perspective.