deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.45k stars 499 forks source link

[BUG] RuntimeError: inconsistent type map #4034

Closed Cloudac7 closed 3 weeks ago

Cloudac7 commented 1 month ago

Bug summary

It seems that in 3.0.0b3, executing multitask training or finetune task would run into a RuntimeError, calling inconsistent type map, while the same case could run perfectly on code installed from 2024Q1 branch. The original input.json is uploaded, to identify the bug.

 Traceback (most recent call last):
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
      return f(*args, **kwargs)
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main
      train(FLAGS)
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 311, in train
      train_data = get_data(
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 802, in get_data
      data = DeepmdDataSystem(
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 184, in __init__
      self.type_map = self._check_type_map_consistency(type_map_list)
    File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 616, in _check_type_map_consistency
      raise RuntimeError(f"inconsistent type map: {ret!s} {ii!s}")
  RuntimeError: inconsistent type map: ['Ag', 'Cu'] ['Ag', 'Ni']

And in #4031, a possible solution to this issue is addressed, but it is not the direct error raised.

DeePMD-kit Version

3.0.0b3

Backend and its version

PyTorch v2.0.0.post200, TensorFlow v2.14.0

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

input.json

Steps to Reproduce

Please run a multitask training using dataset from Domains_Cluster.

Further Information, Files, and Links

No response

njzjz commented 1 month ago

When the type map is not given (which is a bug) and the data has different type maps, it is expected to raise an error, otherwise we don't know which type map should be used for the model. So #4031 should be the correct way to fix it. However, the error message should be improved. The current one is for developers but not users.