Closed sphuber closed 5 years ago
The entities that are definitely not preserved when exporting/importing a workchain node:
DbLog
entries, visible through verdi work report
RETURN
linksCALL
linksA few other things that should be imported/exported are
From #687 we know that:
The initial code for descendants was the following qb = QueryBuilder() qb.append(Calculation, tag='high_node', filters={'id': {'in': [185]}}) qb.append(Data, output_of='high_node', project=['*'], edge_filters={'type': {'==': LinkType.CREATE.value}}) qb.all()
We need (regarding the links)
After discussing with Sebastiaan (@sphuber):
So we need to be in a loop where we follow CREATE, RETURN and CALL links until the set does not change anymore
Just a couple of comments:
the constraint on link types goes in the other direction: e.g. a CREATE is always between a calc and a data, but the reverse is not true (between calc and data there could be a CREATE or a RETURN, for instance). Also, at the moment I think one should "merge" Work into Calculation (it's a subclass). E.g. Work can also CREATE data, WORK can call WORK, ...
in the import, depending on the cases one should look for the ancestors rather than descendants. I.e.:
For reference, one can reuse probably some of the logic of the 'verdi node delete' implemented recently, that also has a very similar recursive logic: https://github.com/aiidateam/aiida_core/pull/1083/files (even if the rules might be slightly different).
I think it would be also good to divide 'following' rules in three groups, so we make sure we don't forget use cases:
And for the optional ones we have user options.
One think to discuss is: should we allow (with some "big" warning) to export stuff explicitly breaking the provenance? Originally I was against, but there have been a few use cases where this is actually needed... (e.g. exporting data but leaving out some inputs because they have some license; or exporting only the results to have slimmer databases, if I want to study something, and I want to have the same UUIDs, but I don't care having the full provenance in my local DB). In this case, it would be good to still have somewhere a 'note' that the provenance was in some way cut or broken (in verdi node delete, we actually add an entry to the DbLog table: https://github.com/aiidateam/aiida_core/pull/1083/files#diff-1ddc7a2c3920cfbae75335771b1d0522R164 but probably it is not a good idea for the export, as this will be logged "forever" even if later we import back the provenance. Maybe instead use extras? to discuss
Available AiiDA link discussion can be found at #1174
This is an overview of the export algorithm (that enriches a given set of nodes by the user). It decides which is the final set of nodes and the links to be exported based on the user choices (private nodes, & rules that can be overridden).
Private nodes The set of private nodes is defined by the user. It can contain Data nodes but also Calculation nodes
Node export rules
Link type | Tail & head of the link | Link traversal | Export direct successor (forward link traversal) / direct predecessor (reverse link traversal) | Exceptions | Comments |
---|---|---|---|---|---|
INPUT | (Data, Calculation) | Forward | No | User can modify default behaviour | By default we don’t care if a Data node candidate for export is used as input to Calculations that are not selected yet for export. |
Reversed | Yes | Except Data nodes that are marked explicitly private | We need the predecessors of a Calculation node. Note that the non-direct predecessors will be found by following either INPUT or CREATE links depending on the type predecessor found in the reverse link traversal. | ||
CREATE | (Calculation, Data) | Forward | Yes | We need the direct Data node successors of the Calculation nodes. | |
Reversed | Yes | ? | We need the predecessors of a Data node. Note that the non-direct predecessors will be found by following either INPUT or CREATE links depending on the type predecessor found in the reverse link traversal. | ||
RETURN | (Calculation, Data) | Forward | Yes | The direct successors of a calculation node are exported | |
Reversed | No | We are not interested in Calculations that just returned a Data node but not created it. | |||
CALL | (Calculation [caller], Calculation [called]) | Forward | Yes | We need to know how the top WorkCalculation obtained its results | |
Reversed | No | User can modify default behaviour |
Node export algorithm
Link export rules
For every node in the to_be_exported, export the corresponding links based on the following rules
Link type | Tail & head of the link | Link export |
---|---|---|
INPUT | (Data, Calculation) | Backward, by the Calculation node / Also forward too by the Data node, if the user modifies the default behaviour |
CREATE | (Calculation, Data) | Forward, by the Calculation node / Backward, by the Data node |
RETURN | (Calculation, Data) | Forward, by the Calculation node |
CALL | (Calculation [caller], Calculation [called]) | Forward, by the Calculation [caller] node / Also backward too by the Calculation [called] node, if the user modifies the default behaviour |
At the receiver side:
Thanks! A few comments:
As a remark, as I'm sure there will be new/slighlty different rules and requirements as soon as new users will use this, we should have each link direction/type with an option, to make it easy to change the rules for different usecases.
Also, in the last sentence, I think it should read "The import file should work in an empty database and it should not crash, when the default link/node selection rules are not overridden"
Thanks for the comments!
I think that the exceptions and the private nodes the whole procedure starts to be more complicated than it should be.
I tend to believe that these should be 2 independent issues: a) rules for graph traversal b) private nodes that when found in traversal should not be exported
I also tend believe for the (b) that the private node set should not stop the graph exploration procedure (i.e. a) . I think that it should continue and just the nodes in the private node set should not be exported.
I also tend to believe that if we would like not to export some nodes because some others are marked as private (i.e. a calculation that has as input private data), we should just enrich the set of private nodes with more nodes based on these assumptions.
That way we separate the problem in (clearly defined) sub-problems: (a) graph exploration rules that enrich an initial set of nodes based on graph connectivity (a.1) how we override these rules based on user desires on graph exploration (b) private nodes that should not be exported (b.2) how we enrich the private node set based on how these nodes are connected and our assumptions
Maybe it would be better to discuss face-to-face with some concrete examples.
What we discussed & the override rules can be found at https://github.com/szoupanos/aiida_core/tree/issue_1102_b
The hidden nodes have not been implemented yet. We can check if what we have works as we would like.
@ltalirz @giovannipizzi Did you find the time to test this? Do you continue to experience the problems that you had before?
When I tried to export the DB with 70k CifData, WorkFunction and ParameterData, I got this:
STARTING EXPORT...
STORING DATABASE ENTRIES...
Exporting a total of 209596 db entries, of which 209593 nodes.
STORING NODE ATTRIBUTES...
STORING NODE LINKS...
STORING GROUP ELEMENTS...
STORING DATA...
STORING FILES...
Traceback (most recent call last):
File "/Users/leopold/Applications/miniconda3/envs/aiida_master/bin/verdi", line 9, in <module>
sys.exit(run())
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 1050, in run
aiida.cmdline.verdilib.exec_from_cmdline(sys.argv)
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 1035, in exec_from_cmdline
CommandClass.run(*argv[command_position + 1:])
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 878, in run
exec (f, globals_dict)
File "script_new.py", line 20, in <module>
export(nodes)
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/orm/importexport.py", line 2472, in export
export_tree(what, folder=folder, silent=silent, **kwargs)
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/orm/importexport.py", line 2221, in export_tree
dest_name='.')
File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/common/folders.py", line 216, in insert_path
raise ValueError("insert_path can only insert files or paths, not symlinks or the like")
ValueError: insert_path can only insert files or paths, not symlinks or the like
I'm not sure whether this is a problem of the state of my DB or of the new export. l'll check with @yakutovicha that he tests the export with a completely new DB again.
Hello, is there any update on this? Did you have the time to check?
I tried to check out your branch and run a simple verdi export create
but get the following exception:
Traceback (most recent call last):
File "/home/sphuber/.virtualenvs/aiida_legacy/bin/verdi", line 11, in <module>
sys.exit(run())
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/verdilib.py", line 1050, in run
aiida.cmdline.verdilib.exec_from_cmdline(sys.argv)
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/verdilib.py", line 1035, in exec_from_cmdline
CommandClass.run(*argv[command_position + 1:])
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/baseclass.py", line 217, in run
function_to_call(*args[1:])
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/commands/exportfile.py", line 31, in cli
verdi()
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/commands/exportfile.py", line 126, in create
outfile=outfile, overwrite=overwrite, **additional_kwargs
File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/orm/importexport.py", line 2438, in export_zip
export_tree(what, folder=folder, silent=silent, **kwargs)
TypeError: export_tree() got an unexpected keyword argument 'also_calc_outputs'
@giovannipizzi The above algorithm expects that the Code is Data instance which is not the case for AiiDA v0.12. I will modify it accordingly and I will change it back when we change the AiiDA hierarchy to what we have agreed (Code is an instance, and subclass, of Data)
Things go well. I exported the following groups from a test database, I imported them to an empty one and all nodes and links are exported as expected (checked all nodes and all links of the imported nodes of the imported groups)
PK GroupName NumNodes User
---- ------------------------------------------------------------------------------------- ---------- -----------------------
200 structures_final_test_bench_small_v2_insulating_magnetic 12 nm@l.h
201 structures_final_test_bench_small_v2_insulating 15 nm@l.h
202 structures_final_test_bench_small_v2 54 nm@l.h
203 pw_scf_final_test_bench_small_v2_metallic_magnetic 12 nm@l.h
204 pw_scf_final_test_bench_small_v2_metallic 15 nm@l.h
205 pw_scf_final_test_bench_small_v2_insulating_magnetic 12 nm@l.h
206 pw_scf_final_test_bench_small_v2_insulating 15 nm@l.h
207 structures_final_test_bench_small_v2_metallic 15 nm@l.h
208 structures_final_test_bench_small_v2_metallic_magnetic 12 nm@l.h
209 pw_scf_final_test_bench_small_v2 54 nm@l.h
211 seekpath_structures_final_test_bench_small_v2 54 am@l.h
213 workchain_out_1 54 am@l.h
233 w90wc_trial_1 20 am@l.h
270 w90_scdm_insulators_v1 6 am@l.h
273 w90_scdm_insulators_v2 15 am@l.h
278 w90_scdm_insulators_v3 14 am@l.h
284 w90_scdm_metals_v1 12 am@l.h
294 w90_scdm_insulators_v3_mlwf 14 am@l.h
333 test_topowf 2 am@l.h
335 w90_scdm_insulators_v3_mlwf_convtol 13 am@l.h
338 w90_scdm_insulators_v3_mlwf_convtol_denser 4 am@l.h
365 w90_scdm_metals_v1_mlwf 12 am@l.h
368 w90_scdm_metals_v1_dis 12 am@l.h
369 w90_scdm_metals_v1_dis_mlwf 12 am@l.h
373 structures_gptest_1 1 gp@l.h
374 gptest_out_1 2 gp@l.h
380 w90_scdm_v4_dense_insulating_mlwf 14 gp@l.h
383 w90_scdm_v4_dense_withconduction_mlwf_insulating 10 gp@l.h
390 gptest_out_w90_1_withconduction_mlwf 1 gp@l.h
394 w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability090 10 gp@l.h
395 w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability090 12 gp@l.h
398 w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability095-sigma4 10 gp@l.h
399 w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability095-sigma4 12 gp@l.h
406 w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability095-sigma4-sigmashift3 14 gp@l.h
407 w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability095-sigma4-sigmashift3 14 gp@l.h
414 w90_scdm_v4_dense_withconduction_mlwf_insulating_musigmaauto1-sigmashift3 14 gp@l.h
415 w90_scdm_v4_dense_withconduction_mlwf_insulating_musigmaauto1-sigmashift0 14 gp@l.h
416 w90_scdm_v4_dense_withconduction_mlwf_metallic_musigmaauto1-sigmashift0 14 gp@l.h
417 w90_scdm_v4_dense_withconduction_mlwf_metallic_musigmaauto1-sigmashift3 14 gp@l.h
418 gptest_out_w90_1_withconduction_mlwf_musigmaauto1-sigmashift0 1 gp@l.h
419 gptest_out_w90_1_withconduction_mlwf_musigmaauto1-sigmashift3 1 gp@l.h
420 workchain_out_2 54 am@l.h
I will do some ping-pong exchange of data to see if I get correct results too.
The work on the new export algorithm that supports workcalculation/workchain nodes and the respective link rules will be implemented under the ticket #1758
The work of the export & import of DbLog
entries will be performed at #1759
Currently the export and import do not respect all the properties of
WorkCalculation
nodes. We will have to make sure that all links are properly exported and imported and that theDbLog
entries are also copied over, which are used for thereport
functionality of theWorkChains