Open greschd opened 4 years ago
Pinging @CasperWA for comments as he's been working a lot with the migrations of export files
@greschd could you please quickly check all the outgoing links (type and label pairs) and report them here? In the past there was the possibility of both a return and create outgoing links from the same node. However, the migration tools should detect this specific case and not complain, so it's maybe something else
From the data.json
, I checked the links where (input, label, type)
is not unique:
input_work
and input_calc
links. Those are harmless I think, it's just the same input being passed to different sub-workflows/calculations.create
links are all from CalcFunctionNode
to Dict
. AFAICT these are what is causing this issue.I think the code producing these issues is here, although I'm not 100% sure this is the right commit version: https://github.com/greschd/aiida-tbextraction/blob/034f8a6ceaa300118064f1eb287037e310f3b45f/aiida_tbextraction/fp_run/_helpers/_inline_calcs.py
The labels of the offending links are {'bands', 'kpoints', 'parameters', 'result', 'wannier_parameters'}
.
Also, I was using a sketchy development version of AiiDA at the time - it's entirely possible this is some edge case related to that.
We've tried to tackle the most general patches of these, but to get just right for a particular graph it ususally takes a bit of manual labour, since the desired outcome may vary from owner to owner, e.g., you don't care about the labels, others might.
Since this migration is most likely due to migrating from v0.3 to v0.4 of the export version, we can try to do this the "hard" way, by migrating to only this version first, then going over the export data.json
file and adapt whatever you may need / try to implement your use case in the migration.
There might also be a log file somewhere with the list of links affected (perhaps in the folder where from where you ran the migrate/export command?
Yeah, the problem here is that the links aren't affected by the migration at all - so they also don't appear in the migration-<id>.log
files.
It appears to me the "inline calculation" produced duplicates of the output nodes, which are both linked with the same label.
I'll give it a quick try at reproducing this with a clean 1.0.0a1
. If that doesn't work I think we can close this - I can get around it for my own use case, and unless someone else reports the same problem it might just as well be due to the wonky setup I had at the time.
Found the culprit... it's caching.
In 1.0.0a1
, running an InlineCalculation
with caching enabled produced two output links, one to the "initial" output and one to a cloned duplicate.
If this is the only place where this can occur I'd be tempted to mark this wontfix
- it's an alpha release, and I was pretty much the only one using caching back then.
However, this made me remember how you could have duplicate CREATE links in previous versions: Outputs did not have to be unique, but then you needed to access them with linkname_<pk>
to get a specific one:
In [17]: inline_calc.get_outputs_dict()
Out[17]:
{u'res': <Data: uuid: 52bed73b-cabc-49ce-aae1-588479454123 (pk: 15)>,
u'res_15': <Data: uuid: 52bed73b-cabc-49ce-aae1-588479454123 (pk: 15)>,
u'res_16': <Data: uuid: c4005890-1bb5-47f4-b1fb-eeb85cfe0144 (pk: 16)>}
In [18]: inline_calc.out.res
Out[18]: <Data: uuid: 52bed73b-cabc-49ce-aae1-588479454123 (pk: 15)>
In [19]: inline_calc.out.res_15
Out[19]: <Data: uuid: 52bed73b-cabc-49ce-aae1-588479454123 (pk: 15)>
In [20]: inline_calc.out.res_16
Out[20]: <Data: uuid: c4005890-1bb5-47f4-b1fb-eeb85cfe0144 (pk: 16)>
The link labels don't contain the _<pk>
suffix:
In [28]: for link in inline_calc.dbnode.output_links.all():
...: print(link, link.label)
...:
(<DbLink: InlineCalculation (14) --> Float (16)>, u'res')
(<DbLink: InlineCalculation (14) --> Float (15)>, u'res')
Here's the link to the relevant documentation in 0.12.3
(the "note" at the very bottom): https://aiida.readthedocs.io/projects/aiida-core/en/v0.12.3/working_with_aiida/resultmanager.html
If memory serves me right, this could happen in any calculation / workchain / whatever, and the caching issue isn't actually creating an invalid link state as far as the old AiiDA version is concerned.
Maybe a way to fix this would be explicitly adding the _<pk>
suffix to these link labels - with the "bare" link referring to the lowest-pk descendant (I think this was how it was decided, but that would need checking in the old code).
Hi!
Great that you tracked this!
The _pk
was added only in python when using node.out.
as, for outgoing links of data nodes, uniqueness is not required (as you noted also above).
Anyway - I agree. Let's set this to wontfix
unless someone else has similar issues!
Sorry, I wasn't being totally clear: Since the caching / inline calcs are in fact not the only place where this can occur I would suggest not to mark this as wontfix
.
The uniqueness was not required in the previous version, but now you can only have one CREATE
link with the same label. I think there should be some workaround for this, since it is "normal" use of the previous AiiDA version.
P.S.: Even if we decide to mark it wontfix
, the bug
label would still apply IMO.
Hmm, I think I misunderstood something: Duplicate outgoing links with the same label still appear to be valid:
In [10]: from aiida.orm import CalcJobNode
In [11]: from aiida.orm import Data
In [12]: calc = CalcJobNode()
In [13]: d1 = Data()
In [14]: d2 = Data()
In [15]: d1.add_incoming(calc, link_type=LinkType.CREATE, link_label='test')
In [16]: d2.add_incoming(calc, link_type=LinkType.CREATE, link_label='test')
So.. is this just an extra validation in the import that actually shouldn't be there?
Ha, just figured out how I tricked the link validation in the example above: Links only show up in get_outgoing
after the target has been stored
In [16]: calc = CalcJobNode()
In [17]: calc.store()
Out[17]: <CalcJobNode: uuid: 6e3b970f-cc00-465e-a6c5-67507a3a7740 (pk: 49831)>
In [18]: d1 = Data()
In [19]: d1.add_incoming(calc, link_type=LinkType.CREATE, link_label='test')
In [20]: calc.get_outgoing(link_type=LinkType.CREATE, link_label_filter='test', only_uuid=True).all()
Out[20]: []
In [21]: d1.store()
Out[21]: <Data: uuid: a776a906-dbfc-48d8-8b56-ac2627b07a28 (pk: 49832)>
In [22]: calc.get_outgoing(link_type=LinkType.CREATE, link_label_filter='test', only_uuid=True).all()
Out[22]: [LinkTriple(node='a776a906-dbfc-48d8-8b56-ac2627b07a28', link_type=<LinkType.CREATE: 'create'>, link_label='test')]
This is relevant here because validate_links
uses get_outgoing
to check for existing links, see https://github.com/aiidateam/aiida-core/blob/develop/aiida/orm/utils/links.py#L163
I'm not sure if
get_outgoing
should include also unstored linksvalidate_link
should not rely on get_outgoing
Tagging @sphuber for comment.
I am confused. I thought this behavior was not allowed and therefore your example should be a bug. If you take your example and then store the nodes, there will be no exception and we have two links that violate the uniqueness constraints:
In [4]: from aiida.orm import CalcJobNode
...: from aiida.orm import Data
...: calc = CalcJobNode()
...: d1 = Data()
...: d2 = Data()
...: d1.add_incoming(calc, link_type=LinkType.CREATE, link_label='test')
...: d2.add_incoming(calc, link_type=LinkType.CREATE, link_label='test')
...: calc.store()
Out[4]: <CalcJobNode: uuid: 94def715-33ab-4065-b750-aca73266d0f0 (pk: 11081)>
In [5]: calc.get_outgoing().all()
Out[5]: []
In [6]: d1.store()
Out[6]: <Data: uuid: 023c9af7-328d-43ca-8ef5-05f4f0bb773c (pk: 11082)>
In [7]: calc.get_outgoing().all()
Out[7]: [LinkTriple(node=<Data: uuid: 023c9af7-328d-43ca-8ef5-05f4f0bb773c (pk: 11082)>, link_type=<LinkType.CREATE: 'create'>, link_label='test')]
In [8]: d2.store()
Out[8]: <Data: uuid: e24e1f7f-6337-469b-a624-c1d13aa1c8eb (pk: 11083)>
In [9]: calc.get_outgoing().all()
Out[9]:
[LinkTriple(node=<Data: uuid: 023c9af7-328d-43ca-8ef5-05f4f0bb773c (pk: 11082)>, link_type=<LinkType.CREATE: 'create'>, link_label='test'),
LinkTriple(node=<Data: uuid: e24e1f7f-6337-469b-a624-c1d13aa1c8eb (pk: 11083)>, link_type=<LinkType.CREATE: 'create'>, link_label='test')]
both links are stored but that should not have been allowed. That can be shown by calling calc.outputs
which expects the labels to be unique:
In [10]: calc.outputs.test
---------------------------------------------------------------------------
MultipleObjectsError Traceback (most recent call last)
<ipython-input-10-6f9a623c1bc3> in <module>
----> 1 calc.outputs.test
~/code/aiida/env/dev/aiida-core/aiida/orm/utils/managers.py in __getattr__(self, name)
81 """
82 try:
---> 83 return self._get_node_by_link_label(label=name)
84 except NotExistent:
85 # Note: in order for TAB-completion to work, we need to raise an
~/code/aiida/env/dev/aiida-core/aiida/orm/utils/managers.py in _get_node_by_link_label(self, label)
62 if self._incoming:
63 return self._node.get_incoming(link_type=self._link_type).get_node_by_label(label)
---> 64 return self._node.get_outgoing(link_type=self._link_type).get_node_by_label(label)
65
66 def __dir__(self):
~/code/aiida/env/dev/aiida-core/aiida/orm/utils/links.py in get_node_by_label(self, label)
296 else:
297 raise exceptions.MultipleObjectsError(
--> 298 'more than one neighbor with the label {} found'.format(label)
299 )
300
MultipleObjectsError: more than one neighbor with the label test found
Weirdly, when looking at the tests in tests/orm/test_node.py
we find the following test:
def test_node_outdegree_unique_triple(self):
"""Test that the validation of links with outdegree `unique_triple` works correctly
The example here is a `CalculationNode` that has two outgoing CREATE links with the same label, but to different
target nodes. This is legal and should pass validation.
"""
creator = CalculationNode().store()
data_one = Data()
data_two = Data()
# Verify that adding two create links with the same link label but to different target is allowed from the
# perspective of the source node (the CalculationNode in this case)
data_one.add_incoming(creator, link_type=LinkType.CREATE, link_label='create')
data_two.add_incoming(creator, link_type=LinkType.CREATE, link_label='create')
data_one.store()
data_two.store()
uuids_outgoing = set(node.uuid for node in creator.get_outgoing().all_nodes())
uuids_expected = set([data_one.uuid, data_two.uuid])
self.assertEqual(uuids_outgoing, uuids_expected)
which tests exactly the example here and claims this should be find. However, the link validation code in aiida.orm.utils.links.validate_link
we see:
link_mapping = {
LinkType.CALL_CALC: (WorkflowNode, CalculationNode, 'unique_triple', 'unique'),
LinkType.CALL_WORK: (WorkflowNode, WorkflowNode, 'unique_triple', 'unique'),
LinkType.CREATE: (CalculationNode, Data, 'unique_pair', 'unique'),
LinkType.INPUT_CALC: (Data, CalculationNode, 'unique_triple', 'unique_pair'),
LinkType.INPUT_WORK: (Data, WorkflowNode, 'unique_triple', 'unique_pair'),
LinkType.RETURN: (WorkflowNode, Data, 'unique_pair', 'unique_triple'),
}
clearly stating that CalculationNode -> Data
should be unique_pair
so target node + link label should be unique. I think the test is just wrong and it is not something we want to allow and there is a bug in the validation which I haven't found yet. Will report later.
bug in the validation
The validation relies on the outgoing node being stored because it uses get_outgoing
.
I agree that the test is probably just wrong, since it's also pretty impossible to get into this situation with "normal" user code (workchains, calcjobs, etc.).
While importing an old export file (export version 0.3), the migrations run through without error, but the actual import raises:
This is due to the "unique pair" check, here: https://github.com/aiidateam/aiida-core/blob/develop/aiida/tools/importexport/dbimport/backends/django/__init__.py#L656
I can't quite remember, was it allowed in a previous version of AiiDA to have multiple outgoing link labels of the same name? If so, what would be a good way to resolve this issue?
In this particular case I can just patch the code to rename the links (since I don't care too much about their names), but in general that might be dangerous.
The export I'm trying to import here can be found at https://polybox.ethz.ch/index.php/s/I1se1WGAiP2iLYX, but it's large and also suffers from #3450 (which I fixed with a
sed
on thedata.json
) - so it's not great for testing on.