galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

Reworking History Export/Import (doubling as Research Object?) #4345

Closed mvdbeek closed 5 years ago

mvdbeek commented 7 years ago

The current history export/import does not work well for dataset collections (in fact these are being ignored), while in the worst case collection input to a tool may break history export (See #4336).

It would be great of we can make the history export collection aware, both for the import and export.

Given that we can extract a workflow from a history, it would perhaps even be worth considering to export histories as (incomplete ...) research objects (http://www.researchobject.org/specifications/).

guerler commented 6 years ago

Does the history import/export for regular datasets work? I tried a cleanly cloned releases and was unable to re-import a history with a single dataset.

mvdbeek commented 6 years ago

Yes, that works for me (on the latest dev). Have you seen any errors ?

mvdbeek commented 6 years ago

For a another, bigger history I did see this on import:

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/__init__.py", line 1467, in cleanup
    galaxy.tools.imp_exp.JobImportHistoryArchiveWrapper(self.app, self.job_id).cleanup_after_job()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/imp_exp/__init__.py", line 138, in cleanup_after_job
    info=dataset_attrs['info'].encode('utf-8'),
AttributeError: 'NoneType' object has no attribute 'encode'

That should be simple enough to fix.

guerler commented 6 years ago

I still get this error here:

galaxy.jobs ERROR 2017-11-22 08:56:42,516 Unable to cleanup job 1285
Traceback (most recent call last):
  File "/Users/guerler/galaxy/lib/galaxy/jobs/__init__.py", line 1466, in cleanup
    galaxy.tools.imp_exp.JobImportHistoryArchiveWrapper(self.app, self.job_id).cleanup_after_job()
  File "/Users/guerler/galaxy/lib/galaxy/tools/imp_exp/__init__.py", line 72, in cleanup_after_job
    history_attrs = load(open(history_attr_file_name))
IOError: [Errno 2] No such file or directory: u'/Users/guerler/galaxy/database/tmp/tmpbR7db4/history_attrs.txt

Additionally the imported history remains empty.

mvdbeek commented 6 years ago

Can you check that you have history_attrs.txt in the exported archive ? How are you doing the import ? From a url or by uploading the archive ?

guerler commented 6 years ago

Both of the import methods throw the same error and history_attrs.txt file matches within the binary archive.

jmchilton commented 6 years ago

xref #3088

sneumann commented 6 years ago

Hi, the initial comment says "dataset collections (in fact these are being ignored)", so the title having research objects is a bit misleading, and also covered in 3088 as mentioned above. As my history has collections, the workaround is to unhide deleted datasets, then copy history to a new one including deleted datasets, and then re-establish/re-create the collections on the target.

mvdbeek commented 5 years ago

@jmchilton has invested a huge amount of work and has completely overhauled the history import/export, and stubs for BagIt archives of history exports are present in https://github.com/galaxyproject/galaxy/pull/7367 and will be in 19.05

jmchilton commented 5 years ago

@mvdbeek I wouldn't close this at all, I feel like #7367 was a good first step that set things up but it really was just the first step. We should make these proper research objects, not just bags. We should extend the import/export store concept to include workflows and workflow invocations, we should define a workflow prov profile, etc... seems like a lot to do still.

mvdbeek commented 5 years ago

There's also https://github.com/galaxyproject/galaxy/issues/3088 that is slightly more specific, I was going to keep that one open.

jmchilton commented 5 years ago

Ahh, as long as there is one open that is fine, good call - thanks.