use `verdi node graph generate` to understand effect of node deletion / export

ltalirz commented 5 years ago

It is often difficult for new users to predict the effect of verdi node export or verdi node delete on a given node. It would be extremely useful, if one could use verdi node graph generate with a corresponding --export/--delete option (or similar) to understand what would be happening, i.e.:

[ ] Given a node identifier, draw the graph of the export file created by verdi export create -N <identifier> (in order to check whether it will contain everything necessary)
[ ] Given a node identifier, draw the graph of the nodes that will be deleted by verdi node delete <identifier>

Currently, this is not really possible since the logic of graph generation and node deletion/export are slightly different.

Furthermore, it would seem wise to harmonize the cli options of node deletion & export (reproduced below), since these operations are kind of "dual" to each other. Graph generation is a different operation, but reusing some options here might make sense as well (less important).

Finally, I suggest to make the help strings of the options more self-explanatory, e.g. replacing

Toggle the call_calc_backward graph traversal rule 
  used in computing the complete node set.[default: False]

with

Whether to follow CALL links to calculations backwards 
when constructing the node set. [default: False]

Mentioning @chrisjsewell @giovannipizzi @CasperWA for comment and @danieleongari for info.

Node deletion

$ verdi node delete -h
  --call-calc-forward / --no-call-calc-forward
                                  Toggle the call_calc_forward graph traversal
                                  rule used in computing the complete node
                                  set.  [default: False]
  --call-work-forward / --no-call-work-forward
                                  Toggle the call_work_forward graph traversal
                                  rule used in computing the complete node
                                  set.  [default: False]
  --create-forward / --no-create-forward
                                  Toggle the create_forward graph traversal
                                  rule used in computing the complete node
                                  set.  [default: True]

Export

$ verdi export create -h
...
  --call-calc-backward / --no-call-calc-backward
                                  Toggle the call_calc_backward graph
                                  traversal rule used in computing the
                                  complete node set.  [default: False]
  --call-work-backward / --no-call-work-backward
                                  Toggle the call_work_backward graph
                                  traversal rule used in computing the
                                  complete node set.  [default: False]
  --create-backward / --no-create-backward
                                  Toggle the create_backward graph traversal
                                  rule used in computing the complete node
                                  set.  [default: True]
  --input-calc-forward / --no-input-calc-forward
                                  Toggle the input_calc_forward graph
                                  traversal rule used in computing the
                                  complete node set.  [default: False]
  --input-work-forward / --no-input-work-forward
                                  Toggle the input_work_forward graph
                                  traversal rule used in computing the
                                  complete node set.  [default: False]
  --return-backward / --no-return-backward
                                  Toggle the return_backward graph traversal
                                  rule used in computing the complete node
                                  set.  [default: False]
...

Graph generation

$ verdi node graph generate -h
...
  -o, --process-out               Show outgoing links for all processes.
  -i, --process-in                Show incoming links for all processes.

sphuber commented 5 years ago

Just want to point out that I recently actually made the generation of the CLI options for verdi export create and verdi node delete so that they now are consistent. They have the exact same names and docstrings. The only difference is that not all flags are available to the two commands and for good reason. The help message can maybe be updated but since they are automatically generated it might not be trivial.

I can see how it might be useful to first see the effect that the command will have, by visualizing it. But as soon as you have any typical size export, there will be so many nodes, I wonder what information it would give by just looking at it. I think the graph would huge. Still, maybe for delete it makes a lot of sense. Maybe a better solution is to give a --dry-run option to both commands that instead of printing node uuids to screen, generate a graph.

ltalirz commented 5 years ago

The only difference is that not all flags are available to the two commands and for good reason.

On second look, this makes sense. I still propose to add a sentence to the description explaining why this is the case.

The help message can maybe be updated but since they are automatically generated it might not be trivial.

Ok, I'll have a look.

But as soon as you have any typical size export, there will be so many nodes, I wonder what information it would give by just looking at it. I think the graph would huge.

Well, in particular for high-throughput studies what you typically do is to start with exporting the calculations for one material, which is quite manageable. In any case, we can easily add a node limit to the graph generation (say, 1000 nodes by default). You could either make the limit configurable via a cli option or prompt the user whether to continue if the limit is reached.

Maybe a better solution is to give a --dry-run option to both commands that instead of printing node uuids to screen, generate a graph.

I think that's a good idea since it scales better if there are more cli commands where we would like to add such a functionality.

giovannipizzi commented 5 years ago

In any case, we can easily add a node limit to the graph generation (say, 1000 nodes by default)

Just a comment: not sure this would be very useful, in this case, because if you want to know which nodes will be deleted and you see only some of them, you might think that these are the only ones getting deleted and that it's ok and delete them, discovering later you deleted more

ltalirz commented 5 years ago

Just a comment: not sure this would be very useful, in this case, because if you want to know which nodes will be deleted and you see only some of them, you might think that these are the only ones getting deleted and that it's ok and delete them, discovering later you deleted more

To clarify: That's not what I'm proposing - it would simply say there are too many nodes and not print anything. Anyhow, I'm not sure this is really needed.

ltalirz commented 5 years ago

While modifying the help strings I noticed that:

there seems to be a create_forward rule (not exposed, makes sense) but no corresponding return_forward rule
there are no input_calc_backward / input_work_backward rules

Is it just that these rules were not needed? Should they be added?

sphuber commented 5 years ago

* there are no `input_calc_backward` / `input_work_backward` rules

They are defined here: https://github.com/aiidateam/aiida-core/blob/dfd616ff78a574fa218bfc25eb3b7dfa80cb325c/aiida/common/links.py#L56 Or do you mean something else? Same goes for return_forward

ltalirz commented 5 years ago

Ah, thanks - then I guess the only thing is that they don't make their way to the cli.

sphuber commented 5 years ago

Yes, exactly, that is decided based on the toggleable property of the GraphTraversalRule tuple. If it is not, it won't be added as an option

aiidateam / aiida-core