Open avanteijlingen opened 9 months ago
Hi, thanks for contributing to TorchANI!
Could I know how did you get the error KeyError: b'C'
you mentioned?
When i make a HDF5 dataset and then load it into ANI it always finds the species table to contain the atoms as b'C', b'H' etc which then it doesnt recognise without doing .decode().
I make the HDF5 datasets always similar to this:
mol, E, C, S, F = [],[],[],[],[]
mol.append(HDF5_Dataset.create_group(groupname))
E.append(mol[-1].create_dataset("energies", (energies.shape[0],), dtype='float64'))
E[-1][()] = energies
C.append(mol[-1].create_dataset("coordinates", Conformers.shape, dtype='float64'))
C[-1][()] = Conformers
species = np.array(species.split(), dtype="<U2") species = np.array(species, dtype = h5py.special_dtype(vlen=str) )
S.append(mol[-1].create_dataset("species", data=species)) atom_types = np.unique(np.hstack((atom_types, species)))
Error:
File ~\anaconda3\lib\site-packages\torchani\data__init__.py:164 in reenterable_iterable_factory d['species'] = numpy.array([idx[s] for s in d['species']], dtype='i8')
File ~\anaconda3\lib\site-packages\torchani\data__init__.py:164 in
d['species'] = numpy.array([idx[s] for s in d['species']], dtype='i8')
KeyError: b'C'
The proposed fix will allow the program to work wether it parses the atomic labels as bytes or strings by using .decode() within a try-catch