aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
433 stars 186 forks source link

Push-pull of AiiDA DBs #4535

Open giovannipizzi opened 3 years ago

giovannipizzi commented 3 years ago

I open this issue to brainstorm on the features that might be needed for having a push-pull mechanism (similar to git) to mirror two AiiDA DBs. Once implemented, using a third central place (similar to GitHub, e.g. Materials Cloud) it would allow to ease collaboration between people, and also for a single person using multiple machines (e.g. a workstation to run simulations, and a laptop just to do data analysis).

I suggest to have comments to brainstorm on features (maybe editing one own's comment to avoid too long threads, if one realises one wants to add/edit the features), and when we start converging, we'll convert on a AEP.

For reference, this is relevant for #974 (export an entire DB) and would need the future to export groups in various modes that is going to be implemented in #4383 (or, the other way around - #4383 should take into account, ideally, at least the possible needs of this push-pull mechanism).

giovannipizzi commented 3 years ago

Some feature requests (in random order, not necessarily all critical):

I would start by having a proof of concept where the various steps are done by hand with various scripts, so we can also benchmark them - wrapping into 'push/pull' commands will be the last step that we can do at the end.

For reference, here are 5 scripts that start doing what I mention, even if they are way less than perfect.

1-get-all-uuids.py.txt 2-get-missing-uuids.py.txt 3-import.py.txt 4-get-all-groups.py.txt 5-overwrite-all-groups.py.txt

Notes:

The last step is clearly sub-ideal and risks to fail. The idea is that #4383 should implement the needed flags so that 4 and 5 are not needed anymore; in 1, one might want to send also the list of groups (or even not), in 2 one would export all groups or all missing groups, and the import in step 3 would use the correct flags (to be defined in #4383) to mirror the groups.

If this works, I suggest we update the files (to be only three) and we can start testing this to see what are the remaining issues (there will be many: in 1 we need also to send groups and computers; in 2 we might want to send back also a list of nodes that don't exist anymore; in 3 we might want to delete those nodes (but only if this does not delete nodes that should instead remain...). Also, one needs to think if we want to export or not unsealed ProcessNodes (my suggestion: no, we only export sealed ones; otherwise, the second time we re-import, the attributes will not be updated because the node UUID is already in the destination).