Out-of-memory when finetuning large datasets with graphs

CederGroupHub / chgnet

Pretrained universal neural network potential for charge-informed atomistic modeling https://chgnet.lbl.gov

Other

215 stars 55 forks source link

Hi CHGNet devs, I am trying to finetune a model on the Open Catalyst Project dataset (https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md). I've run into memory problems when converting the dataset into graphs.

I have been following fine_tuning.ipynb and make_graphs.py in examples. I am able to convert the structures into graphs, but due to the scale of the dataset, memory runs out before I can make a labels.json file.

I can create labels for each individual graph or for smaller batches of the full dataset, but it looks like GraphData in data/dataset.py can only load a single labels.json. Is there a way to batch load labels into a single dataset, or to merge smaller datasets together? Some way to get around the memory problem and train on the full dataset?

CederGroupHub / chgnet

Out-of-memory when finetuning large datasets with graphs #144