ehoogeboom / e3_diffusion_for_molecules

MIT License
408 stars 110 forks source link

Questions regarding dataset error #23

Closed Anonnoname closed 7 months ago

Anonnoname commented 1 year ago

After followed the instructions of building the dataset, I encountered this error: Traceback (most recent call last): File "/home/jovyan/e3_diffusion_for_molecules/eval_sample.py", line 164, in main() File "/home/jovyan/e3_diffusion_for_molecules/eval_sample.py", line 130, in main dataloaders, charge_scale = dataset.retrieve_dataloaders(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/e3_diffusion_for_molecules/qm9/dataset.py", line 45, in retrieve_dataloaders split_data = build_geom_dataset.load_split_data(data_file, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/e3_diffusion_for_molecules/build_geom_dataset.py", line 107, in load_split_data val_data, test_data, train_data = np.split(data_list, [val_index, test_index]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 200, in split File "/root/mambaforge/envs/my-rdkit-env/lib/python3.11/site-packages/numpy/lib/shape_base.py", line 874, in split return array_split(ary, indices_or_sections, axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 200, in array_split File "/root/mambaforge/envs/my-rdkit-env/lib/python3.11/site-packages/numpy/lib/shape_base.py", line 786, in array_split sary = _nx.swapaxes(ary, axis, 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 200, in swapaxes File "/root/mambaforge/envs/my-rdkit-env/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 594, in swapaxes return _wrapfunc(a, 'swapaxes', axis1, axis2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/mambaforge/envs/my-rdkit-env/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc return _wrapit(obj, method, *args, *kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/mambaforge/envs/my-rdkit-env/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit result = getattr(asarray(obj), method)(args, **kwds) ^^^^^^^^^^^^ ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6922516,) + inhomogeneous part.

Can you tell me how to fix this error? Thank you!

ehoogeboom commented 7 months ago

Not sure, could be corrupted data? Never seen this before myself.

TianBian95 commented 1 month ago

Insert data_list = np.array(data_list, dtype=object) in the line 102 of build_geom_dataset.py could solve this problem.

    perm = np.load(os.path.join(base_path, 'geom_permutation.npy'))
    data_list = [data_list[i] for i in perm]
    data_list = np.array(data_list, dtype=object)
    num_mol = len(data_list)
    val_index = int(num_mol * val_proportion)
    test_index = val_index + int(num_mol * test_proportion)
    val_data, test_data, train_data = np.split(data_list, [val_index, test_index])
    return train_data, val_data, test_data