atomistic-machine-learning / schnetpack

SchNetPack - Deep Neural Networks for Atomistic Systems
Other
751 stars 210 forks source link

TypeError: data type 'str448' not understood #626

Closed Kailejiang closed 2 months ago

Kailejiang commented 2 months ago

Dear All, After I successfully prepared and trained md17_uracil.npz following examples/tutorials/tutorial_01_preparing_data.ipynb, I tried preparing my own dataset 16p.npz, but the following error occurred:

Traceback (most recent call last):
  File "C:\Users\WHQ\Desktop\schnetpack-master\qm9_3_use.py", line 62, in <module>
    custom_data.setup()
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\datamodule.py", line 193, in setup
    self._setup_transforms()
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\datamodule.py", line 318, in _setup_transforms
    t.datamodule(self)
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\transform\atomistic.py", line 126, in datamodule
    stats = _datamodule.get_stats(
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\datamodule.py", line 334, in get_stats
    stats = calculate_stats(
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\stats.py", line 44, in calculate_stats
    for props in tqdm(dataloader):
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\tqdm\std.py", line 1181, in __iter__
    for obj in iterable:
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
    data = self._next_data()
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
    data.reraise()
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\_utils.py", line 722, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\atoms.py", line 269, in __getitem__
    props = self._get_properties(
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\schnetpack\data\atoms.py", line 357, in _get_properties
    torch.tensor(row.data[pname].copy()) * self.conversions[pname]
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\ase\db\row.py", line 152, in data
    self._data = bytes_to_object(self._data)  # lazy decoding
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\ase\db\core.py", line 628, in bytes_to_object
    return b2o(obj, b)
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\ase\db\core.py", line 683, in b2o
    dct = {key: b2o(value, b) for key, value in obj.items()}
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\ase\db\core.py", line 683, in <dictcomp>
    dct = {key: b2o(value, b) for key, value in obj.items()}
  File "C:\Users\WHQ\Desktop\schnetpack-master\.venv\lib\site-packages\ase\db\core.py", line 675, in b2o
    dtype = np.dtype(name)
TypeError: data type 'str448' not understood

The imitated code is as follows:

qm9tut = './16p_test'
if not os.path.exists('16p_test'):
    os.makedirs(qm9tut)

data = np.load('./16p.npz')

numbers = data["z"]
atoms_list = []
property_list = []
for positions, energies in zip(data["R"], data["E"]):
    ats = Atoms(positions=positions, numbers=numbers)
    properties = {"energy_U0": energies}
    property_list.append(properties)
    atoms_list.append(ats)

new_dataset = ASEAtomsData.create(
    './16p.db',
    distance_unit='Ang',
    property_unit_dict={'energy_U0':'eV'}
)

new_dataset.add_systems(property_list, atoms_list)

custom_data = spk.data.AtomsDataModule(
    './16p.db',
    batch_size=10,
    distance_unit='Ang',
    property_units={"energy_U0":'eV'},
    num_train=440,
    num_val=40,
    transforms=[
        trn.ASENeighborList(cutoff=5.),
        trn.RemoveOffsets("energy_U0", remove_mean=True, remove_atomrefs=False),
        trn.CastTo32()
    ],
    num_workers=1,
    pin_memory=True, # set to false, when not using a GPU
)
custom_data.prepare_data()

if __name__ == '__main__':
    custom_data.setup()

And my dataset 16p.npz file is attached. 16p.zip I can provide more details and files if you have any ideas.

Thank you for any help.

Kailejiang commented 2 months ago

I guess I may have found the problem. I made an error when reading the energy information (data should be energy in my work). I will try to rebuild the database. debug2

Kailejiang commented 2 months ago

I solved this problem. Because when writing the database, I mistakenly wrote the energy into the array npz in the form of a list, which resulted in this error. When writing the database, you can convert the energy data to float64 first to avoid this error.

If the developers are also trying to find the cause of this issue, I'm sorry for wasting your time and appreciate your efforts.