aiidateam / aiida-quantumespresso

The official AiiDA plugin for Quantum ESPRESSO
https://aiida-quantumespresso.readthedocs.io
Other
54 stars 78 forks source link

PDOSWorkChain excepts due to daemon restart #724

Closed t-reents closed 3 years ago

t-reents commented 3 years ago

Hi,

I encountered a problem in the PDOSWorkChain which I think is related to the following issue from aiida-core aiidateam/aiida-core#5124. The error occurs when the workchain is uploading the DOS and PDOS calculations and I need to restart the daemon or increase/decrease the number of workers at the moment.

The error message is as follows:

2021-09-10 10:42:09 [16716 | REPORT]: [40300|PdosWorkChain|run_pdos_parallel]: launching ProjwfcCalculation<40328>
2021-09-10 10:43:59 [16717 |  ERROR]: Traceback (most recent call last):
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/engine/persistence.py", line 124, in load_checkpoint
    bundle = serialize.deserialize(checkpoint)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 230, in deserialize
    return yaml.load(serialized, Loader=AiiDALoader)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 43, in get_single_data
    return self.construct_document(node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 47, in construct_document
    data = self.construct_object(node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 92, in construct_object
    data = constructor(self, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 156, in bundle_constructor
    yaml_node = loader.construct_mapping(bundle)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 210, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 135, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 92, in construct_object
    data = constructor(self, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 131, in mapping_constructor
    yaml_node = loader.construct_mapping(mapping, deep=True)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 210, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 135, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 94, in construct_object
    data = constructor(self, tag_suffix, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 624, in construct_python_object_apply
    instance = self.make_python_instance(suffix, node, args, kwds, newobj)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 568, in make_python_instance
    raise ConstructorError("while constructing a Python instance", node.start_mark,
yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
  in "<unicode string>", line 26, column 14:
      nscf_emax: !!python/object/apply:numpy.core ...
                 ^

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/manage/external/rmq.py", line 208, in _continue
    result = await super()._continue(communicator, pid, nowait, tag)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/plumpy/process_comms.py", line 606, in _continue
    saved_state = self._persister.load_checkpoint(pid, tag)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/engine/persistence.py", line 126, in load_checkpoint
    raise PersistenceError(f'Failed to load the checkpoint for process<{pid}>: {traceback.format_exc()}')
plumpy.exceptions.PersistenceError: Failed to load the checkpoint for process<40300>: Traceback (most recent call last):
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/engine/persistence.py", line 124, in load_checkpoint
    bundle = serialize.deserialize(checkpoint)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 230, in deserialize
    return yaml.load(serialized, Loader=AiiDALoader)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 43, in get_single_data
    return self.construct_document(node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 47, in construct_document
    data = self.construct_object(node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 92, in construct_object
    data = constructor(self, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 156, in bundle_constructor
    yaml_node = loader.construct_mapping(bundle)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 210, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 135, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 92, in construct_object
    data = constructor(self, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/aiida/orm/utils/serialize.py", line 131, in mapping_constructor
    yaml_node = loader.construct_mapping(mapping, deep=True)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 210, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 135, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 94, in construct_object
    data = constructor(self, tag_suffix, node)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 624, in construct_python_object_apply
    instance = self.make_python_instance(suffix, node, args, kwds, newobj)
  File "/home/treents/.venvs/aiida/lib/python3.8/site-packages/yaml/constructor.py", line 568, in make_python_instance
    raise ConstructorError("while constructing a Python instance", node.start_mark,
yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
  in "<unicode string>", line 26, column 14:
      nscf_emax: !!python/object/apply:numpy.core ...
sphuber commented 3 years ago

This is indeed a problem with numpy objects being stored in the context which cannot be deserialized by default, which is what happens when the daemon is restarted or new workers are added. However, we added compatibility for this in aiida-core==1.6.5 ( see this PR). Are you on an older version perhaps? If so, maybe try upgrading and trying again. I think that should solve the issue

t-reents commented 3 years ago

Thanks for the quick reply @sphuber. I am using version 1.6.4, so I will try to upgrade it and check it again.

sphuber commented 3 years ago

Thanks for the quick reply @sphuber. I am using version 1.6.4, so I will try to upgrade it and check it again.

Great. Did you get a chance to try it and did it work?

t-reents commented 3 years ago

Sorry for the late reply @sphuber. I checked it out the last days and it works!

sphuber commented 3 years ago

Great, thanks a lot. Then I will close this issue.