atomistic-machine-learning / G-SchNet

G-SchNet - a generative model for 3d molecular structures
MIT License
129 stars 24 forks source link

how to calculate homo lumo gap of a new molecule generated by your model? #6

Closed sjchasel closed 2 years ago

sjchasel commented 2 years ago

It seems that your repository only consist the code of getting homo lumo gap from qm9 dataset. But how to calculate the homo lumo gap of a molecule if your model generate a novel one? Thank you!

NiklasGebauer commented 2 years ago

Hello, you can either use a neural network to predict the gap or compute it with reference calculations using a quantum chemistry package like ORCA (depending on the experiment, we do both in our publications). However, code for this is not included in this repository.

The fastest way is to use the pre-trained SchNet model from SchNetPack here: https://schnetpack.readthedocs.io/en/stable/getstarted/benchmarks.html The MAE of that model is 0.074 eV. You can expect the error to be a bit larger on your generated molecules since they are not perfectly in equilibrium state (and the SchNet network is trained on equilibrium structures), but it is a good proxy for evaluation and filtering of the generated structures.

The code could be similar to this (not tested, just a starting point):

from tqdm import tqdm
from schnetpack.data import AtomsLoader, AtomsData
from ase.db import connect
import numpy as np
import torch

# load Dataset
db_path = "/path/to/generated_molecules.db"
with connect(db_path) as con:
    n_molecules = con.count()
bs = 100  # batch size
n_batches = int(np.ceil(n_molecules / bs))

loader = AtomsLoader(AtomsData(db_path, load_only=[]), batch_size=bs, num_workers=4)

property = "gap"
model_path = "/path/to/pretrained/schnet/gap/best_model"
print(f'Predicting {property} for {n_molecules} molecules in '
      f'{n_batches} batches!')
model = torch.load(model_path, map_location="cuda")
prop = np.empty(n_molecules)
with torch.no_grad():
    for i, batch in tqdm(enumerate(loader), total=n_batches):
        for k, v in batch.items():
            batch[k] = v.to("cuda")
        result = model(batch)[property]
        prop[i * bs:(i + 1) * bs] = result.cpu().numpy()[:, 0]
        del batch
        del result

# save predicted property values
np.savetxt("/path/to/results.txt", prop)

Hope this helps!