Open MichailDanikas opened 2 years ago
Probably a bit late for the original poster, but here's what I do to convert from a list of ASE.Atoms objects. I'm not sure if it's 100% correct, but it seems to work fine.
# `train` is a list of ASE.Atoms objects
with h5py.File('train.hdf5', 'w') as hdf5:
for i, atoms in enumerate(train):
natoms = len(atoms)
g = hdf5.create_group(str(i))
g.create_dataset('energies', data=np.atleast_1d(atoms.info['energy']))
g.create_dataset('cell', data=np.array(atoms.cell).reshape((1, 3, 3)))
g.create_dataset('coordinates', data=atoms.positions.reshape((1, natoms, 3)))
g.create_dataset('force', data=atoms.arrays['forces'].reshape((1, natoms, 3)))
g.create_dataset('species', data=[b'C']*natoms)
Hi, I have a problem creating my own dataset to use them later for training. I'm a begginer with h5py but I don't understand how the datasets should be formated. I am trying to use the last part of #611 where my species look like this:
array([['O', 'C', 'O'], ['O', 'C', 'O'],... ['O', 'C', 'O']])
for one molecule. The coordinates are in the from:[array([[[ 0. , 0. , 1.237479], [ 0. , 0. , -0.3 ], [ 0. , 0. , -1.237479]]]),...]
and the energies:[array(226.56324331), array(208.34163576), array(191.23083335),...]
I've also tried other formats which I saved them using:torchani.data._pyanitools.datapacker('./path_to_file', mode = 'w')
which after load them with:torchani.data.load('./path_to_file')
they were tranformed as dictionaries as the examples inani_gdb_s01.h5
do. However, in the training part the following error is prompted: If you have any suggestion please let me know. Thank you in advance.