atomistic-machine-learning / schnetpack

SchNetPack - Deep Neural Networks for Atomistic Systems
Other
774 stars 214 forks source link

How to train the Schnet model using all the data without using a validation set #589

Closed yuchanpei closed 10 months ago

yuchanpei commented 10 months ago

Hi everyone, I want to use all of my data to train schnet model without using a validation set. I found the following code in the documentation, but when I set num_val to 0, I get the following error.

ethanol_data = MD17(
    os.path.join(forcetut,'ethanol.db'),
    molecule='ethanol',
    batch_size=10,
    num_train=1000,
    num_val=1000,
    transforms=[
        trn.ASENeighborList(cutoff=5.),
        trn.RemoveOffsets(MD17.energy, remove_mean=True, remove_atomrefs=False),
        trn.CastTo32()
    ],
    num_workers=1,
    pin_memory=True, # set to false, when not using a GPU
)
Traceback (most recent call last):
  File "test_with_own_data_without_early_stop.py", line 40, in <module>
    data_module.setup()
  File "/pubhome/ycpei02/miniconda3/envs/schnetpach_ampere/lib/python3.8/site-packages/schnetpack/data/datamodule.py", line 183, in setup
    self._load_partitions()
  File "/pubhome/ycpei02/miniconda3/envs/schnetpach_ampere/lib/python3.8/site-packages/schnetpack/data/datamodule.py", line 279, in _load_partitions
    raise AtomsDataModuleError(
schnetpack.data.datamodule.AtomsDataModuleError: If no `split_file` is given, the sizes of the training and validation partitions need to be set!

Can I achieve the corresponding goal by modifying a small amount of code? How should I modify the relevant code? Any help would be much appreciated :)

jnsLs commented 10 months ago

Hi @yuchanpei,

are you sure you want to train your model without validation set? This will most likely result in overfitting since you can't use the early stopping hook. In contrast, specifying an empty test set should be possible.

Since it is not recommended, specifying an empty val set would require quite some modifications in the code. You would have to adapt schnetpack/data/splitting.py and schnetpack/data/datamodule.py.

Best regards, Jonas

yuchanpei commented 10 months ago

Hi @jnsLs, Yes.Due to special circumstances, I need to use all the data for training. I will monitor the training process in real time and closely observe the model's performance on the training data to ensure that overfitting does not occur. Thank you so much for the advice! I will try this and get back to you if I am still having any problems :)