Closed Neon7799 closed 1 year ago
I only just started using Spk within the past week, but also ran into a similar problem. I think Spk wants to be able to call atom_ref[Z], where Z is the atomic number of the species. I solved this by basically setting the atom_ref list to be a list of zeroes as long as the largest atomic number I would need to use. Setting the max index to be Uranium, just as its the largest element I've seen in a MLIAP.
lst_ats=[0.0 for i in range(0,93)]
lst_ats[1] = E_H
lst_ats[8] = E_O
lst_ats[40] = E_Zr
atom_refs = {
'energy': lst_ats
}
And then just inserting that dict into the dataset creation.
Edit: Found this old issue while looking up something else, which provides a more direct answer https://github.com/atomistic-machine-learning/schnetpack/issues/218#issuecomment-597250608
Thanks, it's really helpful
Hi, my dataset(nearly 20,000) has H,C,N,O,Zn five elements, but each geometry may have different size. I can convert it from .npz to .db correctly, but when splitting the dataset, i got error message
import os
from schnetpack.data import ASEAtomsData
from ase import Atoms
import torch
from torch.optim import Adam
import schnetpack.transform as trn
import numpy as np
from schnetpack.data import *
%rm metal.db
data = np.load('./metal.npz',allow_pickle=True)
atoms_list = []
property_list = []
for numbers, positions, energies in zip(data["Z"], data["R"], data["E"]):
ats = Atoms(positions=positions, numbers=numbers)
properties = {'energy': energies,}
property_list.append(properties)
atoms_list.append(ats)
atomrefs = { 'energy': [ -0.598680709282,-38.770836588232,-55.473973919248,-73.967208936440, -1805.489171255718 ] }
newdataset = ASEAtomsData.create( './metal.db', distance_unit='Ang', property_unit_dict={'energy':'Ha'}, atomrefs=atomrefs )
newdataset.add_systems(property_list, atoms_list)
example = newdataset[0]
for k, v in example.items():
print('-', k, ':', v.shape)
data_module = AtomsDataModule(datapath='metal.db',format=AtomsDataFormat.ASE, batch_size=100, num_train=10000, num_val=5000, transforms=[ trn.ASENeighborList(cutoff=5.), trn.RemoveOffsets("energy", remove_mean=True, remove_atomrefs=True), trn.CastTo32() ], num_workers=1, pin_memory=False, property_units={'energy':'Ha'}, distance_unit="Ang", load_properties=["energy"], )
data_module.prepare_data()
data_module.setup()
#error from this line codeStructure.Z is the total number of atoms in each molecule, and 'size 5' means five atom reference energy, but i don't know how to fix the issue. I tested QM9 dataset on QM9 module, it worked well, but when I tested uracil.npz in the tutorial, i got the same error.