ga4gh / tool-registry-service-schemas

APIs for discovering genomics tools, their metadata and their containers
Apache License 2.0
30 stars 18 forks source link

Include Snakemake in descriptor type enumerator #156

Closed uniqueg closed 3 years ago

uniqueg commented 4 years ago

We are currently finalizing support for GA4GH TES in Snakemake (see https://github.com/snakemake/snakemake/tree/integrate_tes_v2). A Snakemake WES is also in the works.

To allow integration with TRS as well, we would thus appreciate if SMK (.smk is the official file extension for Snakemake files) could be added to the DescriptorType schema:

    DescriptorType:
      type: string
      description: The type of descriptor that represents this version of the tool
        (e.g. CWL, WDL, NFL, or GALAXY). Note that these files can also include
        associated Docker/container files  and test parameters that further
        describe a version of a tool.
      enum:
        - CWL
        - WDL
        - NFL
        - GALAXY
        - SNAKEMAKE

I'm happy to file a PR if I get a positive response on here.

┆Issue is synchronized with this Jira Story ┆containerName: GA4GH tool-registry-service ┆Issue Number: TRS-43

ohofmann commented 4 years ago

Very much interesting in this. @jb-adams - can I summon you here for help or pointers?

denis-yuen commented 4 years ago

Cool, and glad to see Snakemake joining the GA4GH community!

jb-adams commented 4 years ago

Looks great! See comments:

To allow integration with TRS as well, we would thus appreciate if SMK (.smk is the official file extension for Snakemake files) could be added to the DescriptorType schema:

      ...
      enum:
        ...
        - SNAKEMAKE

Will the accepted value for snakemake be SNAKEMAKE or SMK? Both are proposed above, can we get one value for this? And, can we harmonize whatever value we use here with the other APIs? (e.g. TES and WES).

uniqueg commented 4 years ago

Thanks a lot for the positive responses!

Regarding naming, I wanted to suggest SMK, as it is the official file extension and it is more consistent with the 3-letter codes used for CWL, WDL and NFL. However, as GALAXY has also been included, this is hardly a convincing reason. That's also likely the reason why I already forgot about it again by the time I posted the updated schema. 🙃

There is nothing in WES or TES specs speaking against including Snakemake as a supported workflow language and there is no recommendation on how to name languages; Snakemake itself is not yet mentioned anywhere. Specifically,

I will ask the main Snakemake implementer on his preference for the name.

jb-adams commented 4 years ago

sounds good @uniqueg

Either is fine, as long as we're consistent across GA4GH APIs. If/when TES and WES plan to incorporate snakemake, they can refer here to TRS to see how the label appears.

johanneskoester commented 4 years ago

Hi folks! Awesome, I am very happy to see this inclusion. If you are referring to the fileformat or language here, you could indeed use SMKas this is the recommended file extension. The entrypoint workflow file is usually calles Snakefile without an ending though (in the spirit of GNU Make). So, I personally have no preference.

uniqueg commented 4 years ago

Great! I'd probably go with SMK then, in the spirit of brevity and consistency with most of the others. SNAKEFILE might perhaps be a bit misleading, given that a tool/workflow can contain multiple files.

Let's wait until after the Plenary to see if anyone has any doubts and then I could file a PR?