As mentioned in #6, training with Rips-Vietoris uses too much memory (>32 GB), especially when setting dis to large values. Because pyg's QM9 is an InMemoryDataset, setting dis to a large value essentially means holding all 3-combinations of atoms of each molecule in memory, which explodes.
The best solution I see is to implement the QM9 dataset ourselves such that it isn't an InMemoryDataset. The transformed dataset would be stored on disk and the dataloader would access each batch from disk.
It is sadly not an option to apply the Rips-Vietoris transform on the fly (before batching). Because it is an expensive algorithm that is applied to each molecule separately, transforming on the fly would increase the runtime of each epoch from 1.5 minutes to 2 hours, meaning it would take over a month to complete 1000 epochs.
As mentioned in #6, training with Rips-Vietoris uses too much memory (>32 GB), especially when setting
dis
to large values. Becausepyg
'sQM9
is anInMemoryDataset
, settingdis
to a large value essentially means holding all 3-combinations of atoms of each molecule in memory, which explodes.The best solution I see is to implement the
QM9
dataset ourselves such that it isn't anInMemoryDataset
. The transformed dataset would be stored on disk and the dataloader would access each batch from disk.It is sadly not an option to apply the Rips-Vietoris transform on the fly (before batching). Because it is an expensive algorithm that is applied to each molecule separately, transforming on the fly would increase the runtime of each epoch from 1.5 minutes to 2 hours, meaning it would take over a month to complete 1000 epochs.