aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
446 stars 125 forks source link

Generating Datasets? #611

Closed rschireman closed 2 years ago

rschireman commented 2 years ago

Hi all,

I see that a few datasets are available after running download.sh, but is there any documentation on how to create a database of coordinates, forces, and energies in hdf5 format? I have a lot of data in ASE db format (sqlite3) and it would be awesome to use it with torchani.

Best, Ray

shubbey commented 2 years ago

You can try something like this:

import h5py
def write_h5(path,tds): # tds == a struct containing some info you've loaded from your db format
    hf = h5py.File(path,'w')
    for td in tds: # td == each molecule
        np_energies = np.empty(len(td.tms)) # td.tms== conformers of molecule, each with unique coords/energy
        np_coords = np.empty([len(td.tms),len(td.tms[0].coords),3])
        for m,tm in enumerate(td.tms): 
            np_energies[m] = tm.energy # target energy for conformer
            for n,coord in enumerate(tm.coords):
                np_coords[m,n,0] = coord[0]
                np_coords[m,n,1] = coord[1]
                np_coords[m,n,2] = coord[2]

        grp = hf.create_group(td.label) # some unique id for this molecule group
        grp.create_dataset("energies",data=np_energies)
        grp.create_dataset("coordinates",data=np_coords)
        grp.create_dataset("species",data=np.bytes_(td.species))  # species = eg; ['C','H','O'...]
        hf.flush()
    hf.close()
rschireman commented 2 years ago

This works! Thank you for your help.