Asking about the default config option `save_interval: 5` when pretraining on AlphaDB

yunxiaoliCB commented 1 year ago

Hi there, I noticed that the default config option save_interval: 5 (https://github.com/DeepGraphLearning/GearNet/blob/1a1d15dcba393f47d18d18e2e945b895c3a848fa/config/pretrain/mc_esm_gearnet.yaml#L66), when taken by the script pretrain.py, will let the model to train on one pickled part (consisting 220k proteins) of AlphaDB for 5 epoch, and then another pickled part for 5 epochs, and so on so forth. (This option also controls the interval that the model is saved, although it could be also be adjusted independently. )

Could you provide a bit insight on why is it set this way? It it needed for some practical reason to train enough number of epochs on one pickle before moving to the next one? Thank you!

Oxer11 commented 1 year ago

Hi, this is a good question. Here we use Round-Robin sampling for pre-training, since it's impossible to load the entire pre-training dataset into the memory. Ideally, we should set save_interval: 1 so that we can iterate over all pickles per epoch. But it takes time to load each pickle into the memory. So we use the save_interval to balance the time of data loading and the training performance. With a small save_interval, we can simulate the ideal round-robin sampling; with a large save_interval, we can reduce the time to load datasets.

Note that if you simply train enough number of epochs on one pickle and then move to the next one, you may face the catastrophic forgetting problem, i.e., forgetting last pickle after training on new pickles.

yunxiaoliCB commented 1 year ago

Thank you for the explanation! That makes perfect sense.

DeepGraphLearning / GearNet

Asking about the default config option `save_interval: 5` when pretraining on AlphaDB #35