galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

Dockstore/WFH upload of Workflows #3049

Open hexylena opened 2 years ago

hexylena commented 2 years ago

Just an issue to track planning / progress

supernord commented 1 year ago

Hey @hexylena - is anyone currently working on options for this?

hexylena commented 1 year ago

We had some contact with someone in the last years who intended to, but I have not heard from them in a long time, I think it's safe to say no one is working on it.

hexylena commented 1 year ago

In the short term we're implementing the bare minimum for a TRS endpoint, because that achieves useful things for us, but in the future we'd love to actually have workflows with great metadata, uploaded/updated to WFH.

supernord commented 1 year ago

thanks 😄 is there an issue or PR for the endpoint?

supernord commented 1 year ago

I'm trying to automatically parse information from the GTN repository that could be used in WorkflowHub

hexylena commented 1 year ago

It was in https://github.com/galaxyproject/training-material/pull/3881/commits/3a4a051d632139237e0a4bb85c8b9e61a3858b33

We have an API if it's helpful, https://training.galaxyproject.org/training-material/api/, but the documentation there isn't updated for the new TRS endpoint api I added.

When we've discussed it in the past with the Galaxy IWC / @mvdbeek the discussion trended towards:

But if you have alternative ideas that are easier to implement and don't involve managing some 60+ repositories (even automatically), I think we'd be curious to hear them

hexylena commented 1 year ago

also if you need any new endpoints / datasets exposed to make it easier just give me a shout.

supernord commented 1 year ago

Hey @hexylena and also @stain 😄

Would it make sense to do the following? I might be missing something obvious.

  1. extract metadata + workflow for each tutorial using the GTN API
  2. collect metadata needed for a RO-crate
  3. create RO-crates for all GTN workflows
  4. submit RO-crates to WfH

Apologies if this approach has already been discussed and ruled out.

hexylena commented 1 year ago

That sounds potentially fine to me?

  1. The topic listing currently has the most metadata per tutorial, I'll see if I can fix the tutorial listing to have the workflow information as well.
  2. Not every workflow is currently annotated with authorship information, we can require that going forward.
  3. Will this be idemoptent, if we attempt to re-upload a crate with the identical workflow/metadata, will it avoid generating a new "version" of the workflow on the WfH end? Or do we need to write some logic around change detection there?

@mvdbeek any opinions on this since you're heavily involved in WFs as well.

hexylena commented 1 year ago

https://github.com/galaxyproject/training-material/pull/3895 will make the API a bit nicer to work with, and add enforced linting for metadata on workflows going forward:

$ curl --silent http://localhost:4002/training-material/api/topics/metagenomics/tutorials/mothur-miseq-sop/tutorial.json | jq .workflows
[
  {
    "workflow": "mothur-miseq-sop.ga",
    "tests": false,
    "url": "http://localhost:4002/training-material/topics/metagenomics/tutorials/mothur-miseq-sop/workflows/mothur-miseq-sop.ga"
  }
]

i.e. you can get this directly from the tutorial page rather than having to look at the topic page (though the addition of the direct URL to the worfklow will extend to both.)

And if there are more APIs needed that would make things more convenient just let us know.

mvdbeek commented 1 year ago

I think whatever you decide is fine. I'd probably try to steer towards a model where you can version your workflows and upload them only when tests pass, that is what https://github.com/galaxyproject/iwc/blob/main/.github/workflows/workflow_test.yml does, that should largely be reusable as is.

hexylena commented 1 year ago

Fair enough @ tests, we've been pushing for those from our side as well but no one ever wants to write them for some reason. Maybe it's time to make them mandatory. (Edit: wf tests now mandatory. https://github.com/galaxyproject/training-material/pull/3895)

mvdbeek commented 1 year ago

It should be fairly easy with https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#generate-test-from-a-workflow-invocation, and since you need small test data for the tutorial anyway it shouldn't be more work to generate this.

hexylena commented 1 year ago

I know! We have documentation on it in multiple places and everything, but still seems like a high bar for folks unfortunately. Not sure why. Maybe we just don't nag enough.

Looking at the workflow_test.yml, it's great to have that as reference. I fear/suspect we'll end up writing our own, as we'd like to test against EU rather than a one-off server, to avoid some of the time costs of testing workflows, and additionally benefit potentially from having a "previously run public workflow" that we can attach as a resource to a training material.

mvdbeek commented 1 year ago

(no need to install from git, probably better if you don't)

hexylena commented 1 year ago

Ah indeed, I think that was before a new release was cut, at one point, that's very outdated. Thanks! I'll get that corrected

mvdbeek commented 1 year ago

If planemo was written in typescript we could just generate all that in the UI 😆. Or maybe we could setup a celery task that re-uses the dependency mechanism to install planemo ...

hexylena commented 1 year ago

Generate the test? Ah it'd be so cool.

I keep having the exact same thought about ptdk / training_init, we could replace this entire thing with a few calls to the API and generate a markdown file from that, rather than requiring server side processes. (Of course we'd need some CORS exceptions, but, it'd be worth it.)

hexylena commented 1 year ago

Just following up with this again, were you still planning to work on this @supernord? Is there any support you need from our end? (it's getting mentioned in a presentation as "work in progress" so figured I'd check in)

supernord commented 1 year ago

Hey @hexylena - thanks for following up 😄 I'm trying to finish code to collect the required metadata from the GTN API as a first step, before then trying to create an RO-crate. Maybe I could get your thoughts on the approach, and if it will work, when I push the code to GitHub?

supernord commented 1 year ago

I would like to also discuss this with the WorkflowHub club team next week

hexylena commented 1 year ago

Yes, absolutely, feel free to open a PR somewhere/tag me somewhere and I'll be happy to look at it!

After reading the RO-Crate training materials they added to the GTN https://training.galaxyproject.org/training-material/topics/fair/ I have to say I feel a lot more hopeful for this!

supernord commented 11 months ago

Hey @hexylena & @mvdbeek - this is where I've added the code I have so far for converting GTN metadata into RO-crates https://github.com/AustralianBioCommons/create-gtn-rocrates

Hopefully this is useful 😄