UoMResearchIT / ro-crate_snakemake_tooling

Collection of python tools for processing snakemake metadata for RO-Crate creation
MIT License
0 stars 0 forks source link

prototype workflow RO-Crate from snakemake workflow #1

Open douglowe opened 3 weeks ago

douglowe commented 3 weeks ago

Work coming from the BGE hackathon in Leiden. Reporting of products made should go in the report here: https://docs.google.com/document/d/1if6ukMKN3xHQHAwGEQPhhgvp7iQcFnauj4W1ZtIs8wk/edit

Aim is to write a python tool which will create a workflow RO-Crate from the outputs and reports created from a snakemake workflow.

Snakemake workflow used: https://github.com/o-william-white/skim2mt.git

tbrown91 commented 2 weeks ago

Hi @douglowe

I am able to give this more thought this week, so am wondering what the best next steps would be. At the moment all of the information I have pulled from the html are just sitting in variables. Do you think it will be easy to turn this into provenance ro-crate?

douglowe commented 2 weeks ago

Hi @tbrown91 - I'm getting a bit of time to look at this too, and have conflicting ideas about how to go about this.

In the long-term I think we can add to the snakemake runner itself, creating an 'ro-crate' report option, as an alternative to the html report. See this issue I created in a local copy of the snakemake repo: https://github.com/eScienceLab/snakemake/issues/1

This probably should start with creating some example RO-Crate files (first a workflow crate, then script the building of a provenance crate from that, using the metadata pulled from the html report), so that we can build a test to include in the snakemake testing suites. Let's have a go at creating that this week?

tbrown91 commented 2 weeks ago

Baby steps https://github.com/UoMResearchIT/ro-crate_snakemake_tooling/commit/befd0dd079b6b81c1fb33a80b61cacd9c19af7d4

There are many things I don't like about the snakemake report, but particularly that the input and output files are not really listed or names. There are a number of wildcards left in, but maybe this is not important for a workflow RO-crate. For the provenance RO-crate I think we will not be able to extract the information we are looking for