aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
446 stars 125 forks source link

Every time I reinstall torchani i get this error message about atoms being bytes instead of strings, this easily fixes this. #639

Open avanteijlingen opened 9 months ago

avanteijlingen commented 9 months ago

Error:

File ~\anaconda3\lib\site-packages\torchani\data__init__.py:164 in reenterable_iterable_factory d['species'] = numpy.array([idx[s] for s in d['species']], dtype='i8')

File ~\anaconda3\lib\site-packages\torchani\data__init__.py:164 in d['species'] = numpy.array([idx[s] for s in d['species']], dtype='i8')

KeyError: b'C'

The proposed fix will allow the program to work wether it parses the atomic labels as bytes or strings by using .decode() within a try-catch

yueyericardo commented 9 months ago

Hi, thanks for contributing to TorchANI! Could I know how did you get the error KeyError: b'C' you mentioned?

avanteijlingen commented 9 months ago

SAMPLE.zip

When i make a HDF5 dataset and then load it into ANI it always finds the species table to contain the atoms as b'C', b'H' etc which then it doesnt recognise without doing .decode().

I make the HDF5 datasets always similar to this:

mol, E, C, S, F = [],[],[],[],[]

mol.append(HDF5_Dataset.create_group(groupname))

E.append(mol[-1].create_dataset("energies", (energies.shape[0],), dtype='float64')) E[-1][()] = energies
C.append(mol[-1].create_dataset("coordinates", Conformers.shape, dtype='float64')) C[-1][()] = Conformers

species = np.array(species.split(), dtype="<U2") species = np.array(species, dtype = h5py.special_dtype(vlen=str) )

S.append(mol[-1].create_dataset("species", data=species)) atom_types = np.unique(np.hstack((atom_types, species)))