Open denis-yuen opened 1 year ago
Yes, nasty problem. Apart from the fact that it's impossible to recreate the original directory tree of a workflow with multiple descriptors (or, more generally, files, as non-descriptor files are affected, too), allowing ".." may have security implications as well.
Might be hard to come up with a non-breaking fix here, though given that the relative path behavior seems to be only explained in the descriptions, we might be able to just get away with rephrasing these such that the anchoring for relative paths should be the nearest parent directory for all files (e.g., the repository root). It'd still break existing clients though...
But if we're discussing breaking changes, we could also re-design the whole file handling to address the following issues:
/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}
could lead to very long URLs that could be problematic in extreme casesOne possibility could be to do away with the /descriptor
, /tests
and /containerfile
endpoints in favor of a redesigned /files
endpoint that makes use of HATEOAS (see #160) and uses file IDs instead of relative paths to fetch info about (ToolFile
) and/or contents of (FileWrapper
) individual files. It might also make the API less complex.
For example, I could hit /files
to get an object of file ID keys and ToolFile
values, as well as a _links
section that points me to /files/{file_id}
for each file to access the corresponding FileWrapper
(or plain text). The bahvior for zip=True
on /files
could remain the same (i.e., fetch an archive of all files). For convenience, we could consider adding filters for file types (PRIMARY_DESCRIPTOR
, TEST_FILE
, CONTAINERFILE
etc.).
One possibility could be to do away with the
/descriptor
,/tests
and/containerfile
endpoints in favor of a redesigned/files
endpoint that makes use of HATEOAS (see #160) and uses file IDs instead of relative paths to fetch info about (ToolFile
) and/or contents of (FileWrapper
) individual files. It might also make the API less complex.
Still thinking it through, but one option that we have is to alter the behaviour of the zipped endpoint
https://github.com/ga4gh/tool-registry-service-schemas/blob/v2.0.1/openapi/openapi.yaml#L322 so that as you say, the root of the zip is the "nearest parent directory for all files" rather than the the directory for main.wdl
The reason why this is less(?) of a breaking change is because our current implementation is simply broken for this case (i.e. one could argue that we're not making the situation worse at least) The client would need to do some comparison between the zip format and the json format of the same endpoint to locate main.wdl
A few new endpoints does seem less disruptive to backwards compatibility, thoughts @patmagee ?
I also think that the minor/non-breaking changes are acceptable, given that there probably aren't that many TRS services and clients around.
As for a re-design: Having a new /files
endpoint (possibly without /{type}
or similar endpoint with the described functionalities while possibly deprecating the ones that such an endpoint would make redundant would indeed be sensible, especially for such a drastic change.
See discussion at https://github.com/dockstore/dockstore/issues/5594 We believe this is a general issue for TRS.
i.e. workflows with a structure like
Affects
"/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}":
┆Issue is synchronized with this Jira Story ┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service ┆Issue Number: TRS-67