aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
414 stars 186 forks source link

Fix import/export for workchains and workfunctions #1102

Closed sphuber closed 5 years ago

sphuber commented 6 years ago

Currently the export and import do not respect all the properties of WorkCalculation nodes. We will have to make sure that all links are properly exported and imported and that the DbLog entries are also copied over, which are used for the report functionality of the WorkChains

sphuber commented 6 years ago

The entities that are definitely not preserved when exporting/importing a workchain node:

giovannipizzi commented 6 years ago

A few other things that should be imported/exported are

szoupanos commented 6 years ago

From #687 we know that:

The initial code for descendants was the following qb = QueryBuilder() qb.append(Calculation, tag='high_node', filters={'id': {'in': [185]}}) qb.append(Data, output_of='high_node', project=['*'], edge_filters={'type': {'==': LinkType.CREATE.value}}) qb.all()

We need (regarding the links)

After discussing with Sebastiaan (@sphuber):

So we need to be in a loop where we follow CREATE, RETURN and CALL links until the set does not change anymore

giovannipizzi commented 6 years ago

Just a couple of comments:

  1. the constraint on link types goes in the other direction: e.g. a CREATE is always between a calc and a data, but the reverse is not true (between calc and data there could be a CREATE or a RETURN, for instance). Also, at the moment I think one should "merge" Work into Calculation (it's a subclass). E.g. Work can also CREATE data, WORK can call WORK, ...

  2. in the import, depending on the cases one should look for the ancestors rather than descendants. I.e.:

    • if I export a data node, I also want its parents, recursively, to keep the provenance (following CREATE+INPUT links).
    • On the other hand, I think if I have a calculation, beside exporting its inputs (and possibly recursively up, as discussed in the previous point), I want to see all its outputs (CREATE+RETURN) but only 1 level deep, and typically not go further down.
    • If I have a calculation, I want to follow (recursively) all the CALL links going down (I don't think that by default one also wants to go up, but maybe this is a useful option for users).
    • And as you say, this should be done in a loop (until the set does not change).

For reference, one can reuse probably some of the logic of the 'verdi node delete' implemented recently, that also has a very similar recursive logic: https://github.com/aiidateam/aiida_core/pull/1083/files (even if the rules might be slightly different).

I think it would be also good to divide 'following' rules in three groups, so we make sure we don't forget use cases:

And for the optional ones we have user options.

One think to discuss is: should we allow (with some "big" warning) to export stuff explicitly breaking the provenance? Originally I was against, but there have been a few use cases where this is actually needed... (e.g. exporting data but leaving out some inputs because they have some license; or exporting only the results to have slimmer databases, if I want to study something, and I want to have the same UUIDs, but I don't care having the full provenance in my local DB). In this case, it would be good to still have somewhere a 'note' that the provenance was in some way cut or broken (in verdi node delete, we actually add an entry to the DbLog table: https://github.com/aiidateam/aiida_core/pull/1083/files#diff-1ddc7a2c3920cfbae75335771b1d0522R164 but probably it is not a good idea for the export, as this will be logged "forever" even if later we import back the provenance. Maybe instead use extras? to discuss

szoupanos commented 6 years ago

Available AiiDA link discussion can be found at #1174

szoupanos commented 6 years ago

This is an overview of the export algorithm (that enriches a given set of nodes by the user). It decides which is the final set of nodes and the links to be exported based on the user choices (private nodes, & rules that can be overridden).


Private nodes The set of private nodes is defined by the user. It can contain Data nodes but also Calculation nodes

Node export rules

Link type Tail & head of the link Link traversal Export direct successor (forward link traversal) / direct predecessor (reverse link traversal) Exceptions Comments
INPUT (Data, Calculation) Forward No User can modify default behaviour By default we don’t care if a Data node candidate for export is used as input to Calculations that are not selected yet for export.
    Reversed Yes Except Data nodes that are marked explicitly private We need the predecessors of a Calculation node. Note that the non-direct predecessors will be found by following either INPUT or CREATE links depending on the type predecessor found in the reverse link traversal.
CREATE (Calculation, Data) Forward Yes   We need the direct Data node successors of the Calculation nodes.
    Reversed Yes ? We need the predecessors of a Data node. Note that the non-direct predecessors will be found by following either INPUT or CREATE links depending on the type predecessor found in the reverse link traversal.
RETURN (Calculation, Data) Forward Yes   The direct successors of a calculation node are exported
    Reversed No   We are not interested in Calculations that just returned a Data node but not created it.
CALL (Calculation [caller], Calculation [called]) Forward Yes   We need to know how the top WorkCalculation obtained its results
    Reversed No User can modify default behaviour  

Node export algorithm

Link export rules

For every node in the to_be_exported, export the corresponding links based on the following rules

Link type Tail & head of the link Link export
INPUT (Data, Calculation) Backward, by the Calculation node / Also forward too by the Data node, if the user modifies the default behaviour
CREATE (Calculation, Data) Forward, by the Calculation node / Backward, by the Data node
RETURN (Calculation, Data) Forward, by the Calculation node
CALL (Calculation [caller], Calculation [called]) Forward, by the Calculation [caller] node / Also backward too by the Calculation [called] node, if the user modifies the default behaviour

At the receiver side:

giovannipizzi commented 6 years ago

Thanks! A few comments:

As a remark, as I'm sure there will be new/slighlty different rules and requirements as soon as new users will use this, we should have each link direction/type with an option, to make it easy to change the rules for different usecases.

Also, in the last sentence, I think it should read "The import file should work in an empty database and it should not crash, when the default link/node selection rules are not overridden"

szoupanos commented 6 years ago

Thanks for the comments!

I think that the exceptions and the private nodes the whole procedure starts to be more complicated than it should be.

I tend to believe that these should be 2 independent issues: a) rules for graph traversal b) private nodes that when found in traversal should not be exported

I also tend believe for the (b) that the private node set should not stop the graph exploration procedure (i.e. a) . I think that it should continue and just the nodes in the private node set should not be exported.

I also tend to believe that if we would like not to export some nodes because some others are marked as private (i.e. a calculation that has as input private data), we should just enrich the set of private nodes with more nodes based on these assumptions.

That way we separate the problem in (clearly defined) sub-problems: (a) graph exploration rules that enrich an initial set of nodes based on graph connectivity (a.1) how we override these rules based on user desires on graph exploration (b) private nodes that should not be exported (b.2) how we enrich the private node set based on how these nodes are connected and our assumptions

Maybe it would be better to discuss face-to-face with some concrete examples.

szoupanos commented 6 years ago

What we discussed & the override rules can be found at https://github.com/szoupanos/aiida_core/tree/issue_1102_b

The hidden nodes have not been implemented yet. We can check if what we have works as we would like.

szoupanos commented 6 years ago

@ltalirz @giovannipizzi Did you find the time to test this? Do you continue to experience the problems that you had before?

ltalirz commented 6 years ago

When I tried to export the DB with 70k CifData, WorkFunction and ParameterData, I got this:

STARTING EXPORT...
STORING DATABASE ENTRIES...
Exporting a total of 209596 db entries, of which 209593 nodes.
STORING NODE ATTRIBUTES...
STORING NODE LINKS...
STORING GROUP ELEMENTS...
STORING DATA...
STORING FILES...
Traceback (most recent call last):
  File "/Users/leopold/Applications/miniconda3/envs/aiida_master/bin/verdi", line 9, in <module>
    sys.exit(run())
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 1050, in run
    aiida.cmdline.verdilib.exec_from_cmdline(sys.argv)
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 1035, in exec_from_cmdline
    CommandClass.run(*argv[command_position + 1:])
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/cmdline/verdilib.py", line 878, in run
    exec (f, globals_dict)
  File "script_new.py", line 20, in <module>
    export(nodes)
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/orm/importexport.py", line 2472, in export
    export_tree(what, folder=folder, silent=silent, **kwargs)
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/orm/importexport.py", line 2221, in export_tree
    dest_name='.')
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_core/aiida/common/folders.py", line 216, in insert_path
    raise ValueError("insert_path can only insert files or paths, not symlinks or the like")
ValueError: insert_path can only insert files or paths, not symlinks or the like

I'm not sure whether this is a problem of the state of my DB or of the new export. l'll check with @yakutovicha that he tests the export with a completely new DB again.

szoupanos commented 6 years ago

Hello, is there any update on this? Did you have the time to check?

sphuber commented 6 years ago

I tried to check out your branch and run a simple verdi export create but get the following exception:

Traceback (most recent call last):
  File "/home/sphuber/.virtualenvs/aiida_legacy/bin/verdi", line 11, in <module>
    sys.exit(run())
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/verdilib.py", line 1050, in run
    aiida.cmdline.verdilib.exec_from_cmdline(sys.argv)
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/verdilib.py", line 1035, in exec_from_cmdline
    CommandClass.run(*argv[command_position + 1:])
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/baseclass.py", line 217, in run
    function_to_call(*args[1:])
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/commands/exportfile.py", line 31, in cli
    verdi()
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sphuber/.virtualenvs/aiida_legacy/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/cmdline/commands/exportfile.py", line 126, in create
    outfile=outfile, overwrite=overwrite, **additional_kwargs
  File "/home/sphuber/code/aiida/env/legacy/aiida-core/aiida/orm/importexport.py", line 2438, in export_zip
    export_tree(what, folder=folder, silent=silent, **kwargs)
TypeError: export_tree() got an unexpected keyword argument 'also_calc_outputs'
szoupanos commented 5 years ago

@giovannipizzi The above algorithm expects that the Code is Data instance which is not the case for AiiDA v0.12. I will modify it accordingly and I will change it back when we change the AiiDA hierarchy to what we have agreed (Code is an instance, and subclass, of Data)

szoupanos commented 5 years ago

Things go well. I exported the following groups from a test database, I imported them to an empty one and all nodes and links are exported as expected (checked all nodes and all links of the imported nodes of the imported groups)

 PK  GroupName                                                                                NumNodes  User
----  -------------------------------------------------------------------------------------  ----------  -----------------------
 200  structures_final_test_bench_small_v2_insulating_magnetic                                       12  nm@l.h
 201  structures_final_test_bench_small_v2_insulating                                                15  nm@l.h
 202  structures_final_test_bench_small_v2                                                           54  nm@l.h
 203  pw_scf_final_test_bench_small_v2_metallic_magnetic                                             12  nm@l.h
 204  pw_scf_final_test_bench_small_v2_metallic                                                      15  nm@l.h
 205  pw_scf_final_test_bench_small_v2_insulating_magnetic                                           12  nm@l.h
 206  pw_scf_final_test_bench_small_v2_insulating                                                    15  nm@l.h
 207  structures_final_test_bench_small_v2_metallic                                                  15  nm@l.h
 208  structures_final_test_bench_small_v2_metallic_magnetic                                         12  nm@l.h
 209  pw_scf_final_test_bench_small_v2                                                               54  nm@l.h
 211  seekpath_structures_final_test_bench_small_v2                                                  54  am@l.h
 213  workchain_out_1                                                                                54  am@l.h
 233  w90wc_trial_1                                                                                  20  am@l.h
 270  w90_scdm_insulators_v1                                                                          6  am@l.h
 273  w90_scdm_insulators_v2                                                                         15  am@l.h
 278  w90_scdm_insulators_v3                                                                         14  am@l.h
 284  w90_scdm_metals_v1                                                                             12  am@l.h
 294  w90_scdm_insulators_v3_mlwf                                                                    14  am@l.h
 333  test_topowf                                                                                     2  am@l.h
 335  w90_scdm_insulators_v3_mlwf_convtol                                                            13  am@l.h
 338  w90_scdm_insulators_v3_mlwf_convtol_denser                                                      4  am@l.h
 365  w90_scdm_metals_v1_mlwf                                                                        12  am@l.h
 368  w90_scdm_metals_v1_dis                                                                         12  am@l.h
 369  w90_scdm_metals_v1_dis_mlwf                                                                    12  am@l.h
 373  structures_gptest_1                                                                             1  gp@l.h
 374  gptest_out_1                                                                                    2  gp@l.h
 380  w90_scdm_v4_dense_insulating_mlwf                                                              14  gp@l.h
 383  w90_scdm_v4_dense_withconduction_mlwf_insulating                                               10  gp@l.h
 390  gptest_out_w90_1_withconduction_mlwf                                                            1  gp@l.h
 394  w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability090                             10  gp@l.h
 395  w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability090                               12  gp@l.h
 398  w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability095-sigma4                      10  gp@l.h
 399  w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability095-sigma4                        12  gp@l.h
 406  w90_scdm_v4_dense_withconduction_mlwf_insulating_projectability095-sigma4-sigmashift3          14  gp@l.h
 407  w90_scdm_v4_dense_withconduction_mlwf_metallic_projectability095-sigma4-sigmashift3            14  gp@l.h
 414  w90_scdm_v4_dense_withconduction_mlwf_insulating_musigmaauto1-sigmashift3                      14  gp@l.h
 415  w90_scdm_v4_dense_withconduction_mlwf_insulating_musigmaauto1-sigmashift0                      14  gp@l.h
 416  w90_scdm_v4_dense_withconduction_mlwf_metallic_musigmaauto1-sigmashift0                        14  gp@l.h
 417  w90_scdm_v4_dense_withconduction_mlwf_metallic_musigmaauto1-sigmashift3                        14  gp@l.h
 418  gptest_out_w90_1_withconduction_mlwf_musigmaauto1-sigmashift0                                   1  gp@l.h
 419  gptest_out_w90_1_withconduction_mlwf_musigmaauto1-sigmashift3                                   1  gp@l.h
 420  workchain_out_2                                                                                54  am@l.h

I will do some ping-pong exchange of data to see if I get correct results too.

szoupanos commented 5 years ago

The work on the new export algorithm that supports workcalculation/workchain nodes and the respective link rules will be implemented under the ticket #1758

szoupanos commented 5 years ago

The work of the export & import of DbLog entries will be performed at #1759