ga4gh / tool-registry-service-schemas

APIs for discovering genomics tools, their metadata and their containers
Apache License 2.0
30 stars 18 forks source link

Tool Descriptor relative path is not discoverable #20

Closed delagoya closed 6 years ago

delagoya commented 7 years ago

I do not think the values for following path are discoverable:

/tools/{id}/versions/{version-id}/{type}/descriptor/{relative-path}

I assume the intent is to be able to navigate to subsumed CWL files that are encoded in a main CWL file of a workflow description. Is that correct? If so then it should be documented.

denis-yuen commented 7 years ago

It is documented although we welcome making it more understandable since it doesn't seem to be right now.

We currently have

/tools/{id}/versions/{version-id}/{type}/descriptor/{relative-path}:
    get:
      summary: Get additional tool descriptor files (CWL/WDL) relative to the main file
description: Returns additional CWL or WDL descriptors for the specified tool in the same or subdirectories

and

        - name: relative-path
          in: path
          required: true
          type: string
description: A relative path to the additional file (same directory or subdirectories), for example 'foo.cwl' would return a 'foo.cwl' from the same directory as the main descriptor

https://github.com/ga4gh/tool-registry-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L203

It is true that the relative files aren't discoverable from the API directly, it would require knowledge of the language whether it be CWL or WDL.

delagoya commented 7 years ago

Yes that is exactly my point. CWL or WDL knowledge to even know that there are other files there to grab.

WDL import statements are a straight forward full URI https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#import-statements

CWL is more complex in that run can be a SubWorkflow with lots of options, only one of which is another CWL specification that is assumed to be a file on the currently accessible filesystem. From examples that I have seen, sub-workflows tend to be collected into a subdirectory, and the URL scheme will reflect that. Will this swagger definition of (relative-path} handle non-encoded URLs?

E.g. if CWL foo.cwl references foo/bar.cwl should the URL endpoint to the API be

/tools/123/versions/0.0.1/CWL/descriptor/foo/bar.cwl

or

/tools/123/versions/0.0.1/CWL/descriptor/foo%2Fbar.cwl

?

denis-yuen commented 7 years ago

Oh, encoding is defined in the swagger specification. Parameters are encoded by default and one would need to specify "allowReserved" under https://swagger.io/specification/#fixed-fields-51 in order to avoid encoding.

delagoya commented 7 years ago

Got it, so the second form.

tetron commented 7 years ago

@delagoya CWL follows URI relative-reference rules, so references are composed from the base URI of the originating document. So CWL foo.cwl references foo/bar.cwl the URL endpoint to the API should be /tools/123/versions/0.0.1/CWL/descriptor/foo/bar.cwl (the tool API must expose it this way or else the ability to load & execute directly from a http URL won't work).

However you're right that to request documents over http URL encoding is required for other characters (such as spaces), and references in CWL currently are not URL encoded. For CWL, the rule should probably be that relative references have URL encoding applied, but absolute references (with a scheme) don't.

delagoya commented 7 years ago

Got it, so the second form then ;-)

Having said that, the above is a problem for the CWL executor engine, which may or may not be the provider of the TRS endpoint. I would assume that the executor engine would want to download the CWL descriptor files prior to running anything, so it would be a moot point whether the TRS endpoint only supports the URL escaped version, no?

tetron commented 7 years ago

Presumably the swagger spec says it must be url-encoded so that you can distinguish between a slash used as a path separator and one that's part of the path part (?). But at the risk of repeating myself, slashes are understood to be path separators for relative references in a CWL file and don't get URL encoded in the request, so if TRS exposes them with the slashes URL encoded , that will needlessly break CWL's internal references. You're right it could download a manifest of files and reconstruct the directory structure locally, but that would be an unnecessary extra step and defeats the purpose of using URLs in the first place.

geoffjentry commented 7 years ago

@delagoya I wouldn't base too much on what that WDL spec document says. At the moment the only WDL implementation that handles imports that I know of is Cromwell (considering how few WDL impls there are, that effectively means it's the only one) and we only handle file URIs at the moment.

denis-yuen commented 6 years ago

FYI, we've added an endpoint to list secondary files

@garyluu can you confirm whether this works with cwltool to grab additional files for launching from a URL or whether the encoding issue as described above affects us?

garyluu commented 6 years ago

It seems slashes require encoding.

denis-yuen commented 6 years ago

@garyluu
Any luck with allowReserved? This would be a good test to add to the Dockstore implementation. i.e. launching a simple tool using imports directly from the TRS V2 relative endpoints

garyluu commented 6 years ago

allowReserved looks to be an OAPI 3.0 feature which we currently don't have yet

denis-yuen commented 6 years ago

@garyluu Hmm, time for some creative OpenAPI 2.0 workaround. Or perhaps this is something that we can do in the actual implementation even if there is no explicit way to indicate it in the API specification short of comments.

denis-yuen commented 6 years ago

FYY, my understanding is that this has been addressed in Dockstore https://github.com/ga4gh/dockstore/pull/1172 and to a lesser extent https://github.com/ga4gh/dockstore/issues/1097

To sum up:

Action items for @garyluu