This PR cleans up import/export functionality, which is presently implemented by tarring up the vistrails project directory.
Specifically
Export presently has a race condition, since file writes are not guaranteed to be atomic, or even reflected to disk without an fsync. The tar command can grab literally anything.
Import, by necessity forces the imported project to use the original exported project's identifier, since all it does is untar the repository.
There is no documented standard for the content of an export. This makes it difficult to integrate with other potential functionality, like for example Jupyter Import
Similarly, this prevents us from creating other backends (e.g., using a database for faster access)
Similarly, any changes to the Viztrails on-disk schema will break imports.
With this PR, export now explicitly goes through the VistrailsRepository API to serialize the project, all branches, workflows, modules, and uploaded files. Where available, exporting makes use of vizier.api.serialize to accomplish this. Importing, on the other hand makes limited use of vizier.api.deserialize, since it needs to create "native" representations of each of the reconstituted objects.
Tasks:
[x] Export to a tarball
[x] Import an exported tarball
[ ] Add documentation for tarball format
[x] Update import/export in API to use the new import/export functionality
Part of my development process is the use of diff (Meld) to compare the pre-post images of the exported repository. As of right now, the differences are minimal.
Module outputs / provenance are not preserved. This is semi-intentional: The FS and Mimir backends rely on very different data layouts. (i) I don't want to tie the export to a specific backend, and re-executing the workflow should build the backend state appropriately. For now, imported modules are left in a PENDING state.
A fresh project ID is generated (this is intentional)
Creation times for the Project, Branch, Workflows, and Modules are replaced (I think I can fix this)
This PR cleans up import/export functionality, which is presently implemented by tarring up the vistrails project directory.
Specifically
tar
command can grab literally anything.With this PR, export now explicitly goes through the VistrailsRepository API to serialize the project, all branches, workflows, modules, and uploaded files. Where available, exporting makes use of
vizier.api.serialize
to accomplish this. Importing, on the other hand makes limited use ofvizier.api.deserialize
, since it needs to create "native" representations of each of the reconstituted objects.Tasks: