generate dataset for torchani

aiqm / torchani

Accurate Neural Network Potential on PyTorch

MIT License

466 stars 129 forks source link

Hi, I have a problem creating my own dataset to use them later for training. I'm a begginer with h5py but I don't understand how the datasets should be formated. I am trying to use the last part of #611 where my species look like this: array([['O', 'C', 'O'], ['O', 'C', 'O'],... ['O', 'C', 'O']]) for one molecule. The coordinates are in the from: [array([[[ 0. , 0. , 1.237479], [ 0. , 0. , -0.3 ], [ 0. , 0. , -1.237479]]]),...] and the energies: [array(226.56324331), array(208.34163576), array(191.23083335),...] I've also tried other formats which I saved them using: torchani.data._pyanitools.datapacker('./path_to_file', mode = 'w') which after load them with: torchani.data.load('./path_to_file') they were tranformed as dictionaries as the examples in ani_gdb_s01.h5 do. However, in the training part the following error is prompted: If you have any suggestion please let me know. Thank you in advance.

# `train` is a list of ASE.Atoms objects with h5py.File('train.hdf5', 'w') as hdf5: for i, atoms in enumerate(train): natoms = len(atoms) g = hdf5.create_group(str(i)) g.create_dataset('energies', data=np.atleast_1d(atoms.info['energy'])) g.create_dataset('cell', data=np.array(atoms.cell).reshape((1, 3, 3))) g.create_dataset('coordinates', data=atoms.positions.reshape((1, natoms, 3))) g.create_dataset('force', data=atoms.arrays['forces'].reshape((1, natoms, 3))) g.create_dataset('species', data=[b'C']*natoms)

aiqm / torchani

generate dataset for torchani #622