Docs missing tutorial using LMDBDatabase and custom properties

FAIR-Chem / fairchem

FAIR Chemistry's library of machine learning methods for chemistry

https://opencatalystproject.org/

Other

763 stars 242 forks source link

Docs missing tutorial using LMDBDatabase and custom properties #788

Open zulissimeta opened 1 month ago

zulissimeta commented 1 month ago

https://github.com/FAIR-Chem/fairchem/issues/787 highlights that our docs have a hole for users who want to train on molecule properties with custom outputs like homo-lumo gaps.

We should add a simple example to the tutorials, perhaps:

download qm9
write an ASE db
fine-tune a checkpoint for homo-lumo

siyu-g commented 1 month ago

Hi Zach,

Thanks for your quick response. Long time no see! This is Bruno from Noa's group speaking. I would be happy contribute to the tutorial. And I believe during my intern last year, I was able to write the data preprocessing script/documentation to convert both QM9 and OE62 data to the LMDBs, I am just write to ask if the scripts and docs are still available. If so, it would make it a lot easier for me to generate the LMDB, make tutorials, and use the ocp models in further applications.

Thanks, Bruno

siyu-g commented 1 month ago

Hi, I am just following up the previous message. Is there any file that I can refer to when trying to train a molecular property?

Bruno

zulissimeta commented 1 month ago

Hi, I am just following up the previous message. Is there any file that I can refer to when trying to train a molecular property?

Bruno

Sorry I missed this!

To write an ASE LMDB:

from fairchem.core.datasets.datasets.lmdb_database import LMDBDatabase

with LMDBDatabase('my_dataset.aselmdb') as db:
    for atoms in atoms_list:
        db.write(atoms)
        # optionally db.write(atoms, data=atoms.info) if you want to store info as data

Then refer to this https://fair-chem.github.io/core/ase_dataset_creation.html for training. We should definitely iterate on this!

github-actions[bot] commented 4 days ago

This issue has been marked as stale because it has been open for 30 days with no activity.