Open kbergin opened 5 years ago
[~chengche] I summarized the conversation about this in data-store repo here
➤ Nick Barkas commented:
I picked this up, but it appears that recursive globbing is not supported until WDL 2.0. After conversation with [~accountid:557058:7f318d05-6e5c-4289-a9e0-ef0441bd7e57] we will revisit this when WDL 2.0 becomes available.
I attach a test demonstrating the problem. Even this hack will not preserve the directory structure:
workflow test { call dir_struct_example {}
output {
Array[File] outputfiles = dir_struct_example.files
}
}
task dir_struct_example { command { mkdir output touch output/fileX.txt mkdir output/dirA touch output/dirA/fileAA.txt touch output/dirA/fileAB.txt mkdir output/dirB touch output/dirB/fileBA.txt touch output/dirB/fileBB.txt mkdir output/dirB/dirBC/ touch output/dirB/dirBC/fileBCA.txt } output { Array[File] files = glob("output/") Array[File] files2 = glob("output//") Array[File] files3 = glob("output///") }
Blocked by WDL features
Why? Previously both WDL and the HCA data store were unable to support directory structures by having
in filenames. We use zarr as one of our sparse matrix output file formats for the HCA and had to use
as a placeholder. See github issue in Data Store addressing the directory structure for more background. WDL also now supports directory outputs. Implementation issue in Data Store
Where to start: https://github.com/HumanCellAtlas/skylab/blob/0b3ecaf1d6a10b986eb55521320e8f319f5dc93b/library/tasks/ZarrUtils.wdl#L38
A good place to start would be to put example outputs that have the slashes into the data store to enable downstream components to test.
ACs:
in the filenames in SS2 and Optimus outputs
┆Issue is synchronized with this Jira Bug