FAIR-Chem / fairchem

FAIR Chemistry's library of machine learning methods for chemistry
https://opencatalystproject.org/
Other
763 stars 242 forks source link

AtomsToGraph converter for single molecules #787

Open siyu-g opened 1 month ago

siyu-g commented 1 month ago

Hi,

I am working on some organic molecules, and I am wondering when converting the ASE Atoms objects to the graphical representation LMDB, is there anything we need to change in the tags listed here: a2g = AtomsToGraphs( max_neigh=50, radius=6, r_energy=True, # False for test data r_forces=True, # False for test data r_distances=False, r_fixed=True, )

I assume, the r_energy and r_forces are not necessary, since if I am training on some other properties, say homo-lumo gap, then the force and energy shouldn't be useful. Could you confirm that?

Thanks!

zulissimeta commented 1 month ago

Hi - I would highly recommend the new ASELMDB format in our repo. It's faster than the old LMDBs, but also ASE compliant so much easier to read/write/edit. Graph creation then gets done during the data loading, but is so fast it's not a big deal.

from fairchem.core.datasets.lmdb_database import LMDBDatabase
with LMDBDatabase('my_db.aselmdb') as connect:
    for atoms in atoms_list:
        connect.write(atoms)

Afterwards, see this page on ways to use them. https://fair-chem.github.io/fairchem/core/ase_dataset_creation.html

I'll make a github issue for the fact that

  1. We don't talk about LMDBDatabase in the docs, but we should promote that as the default
  2. We should include an example of fitting properties that are not energy/forces in the tutorial (perhaps the fine-tuning one).

If you would be interested in contributing that tutorial (and/or working together on it), say if you are fitting homo-lumo for an open dataset like QM9 or OE62, let us know! We would welcome a PR.

Hope that helps!

github-actions[bot] commented 1 week ago

This issue has been marked as stale because it has been open for 30 days with no activity.