Closed YouCanNotKnow closed 3 months ago
How large is the OC20 labels file? If the label dictionary itself is too large to fit into memory, a work-around would be to save each label independently/ in batch. The dataset object can be initiated with only keys, and the corresponding inputs and labels can be loaded from disks on-the-fly during the epochs.
This will require a modified implementation of Dataset
object, which you can adapt from one of the Dataset we provided in dataset.py
.
Hi CHGNet devs, I am trying to finetune a model on the Open Catalyst Project dataset (https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md). I've run into memory problems when converting the dataset into graphs.
I have been following fine_tuning.ipynb and make_graphs.py in examples. I am able to convert the structures into graphs, but due to the scale of the dataset, memory runs out before I can make a labels.json file.
I can create labels for each individual graph or for smaller batches of the full dataset, but it looks like GraphData in data/dataset.py can only load a single labels.json. Is there a way to batch load labels into a single dataset, or to merge smaller datasets together? Some way to get around the memory problem and train on the full dataset?