HumanCellAtlas / secondary-analysis

Secondary Analysis Service of the Human Cell Atlas Data Coordination Platform
https://pipelines.data.humancellatlas.org/ui/
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Fix ! workaround in zarr converter #750

Open kbergin opened 5 years ago

kbergin commented 5 years ago

Blocked by WDL features

Why? Previously both WDL and the HCA data store were unable to support directory structures by having

/

in filenames. We use zarr as one of our sparse matrix output file formats for the HCA and had to use

\!

as a placeholder. See github issue in Data Store addressing the directory structure for more background. WDL also now supports directory outputs. Implementation issue in Data Store

Where to start: https://github.com/HumanCellAtlas/skylab/blob/0b3ecaf1d6a10b986eb55521320e8f319f5dc93b/library/tasks/ZarrUtils.wdl#L38

A good place to start would be to put example outputs that have the slashes into the data store to enable downstream components to test.

ACs:

in the filenames in SS2 and Optimus outputs

┆Issue is synchronized with this Jira Bug

kbergin commented 5 years ago

[~chengche] I summarized the conversation about this in data-store repo here

kbergin commented 5 years ago

➤ Nick Barkas commented:

I picked this up, but it appears that recursive globbing is not supported until WDL 2.0. After conversation with [~accountid:557058:7f318d05-6e5c-4289-a9e0-ef0441bd7e57] we will revisit this when WDL 2.0 becomes available.

I attach a test demonstrating the problem. Even this hack will not preserve the directory structure:

workflow test { call dir_struct_example {}

output {
    Array[File] outputfiles = dir_struct_example.files
}

}

task dir_struct_example { command { mkdir output touch output/fileX.txt mkdir output/dirA touch output/dirA/fileAA.txt touch output/dirA/fileAB.txt mkdir output/dirB touch output/dirB/fileBA.txt touch output/dirB/fileBB.txt mkdir output/dirB/dirBC/ touch output/dirB/dirBC/fileBCA.txt } output { Array[File] files = glob("output/") Array[File] files2 = glob("output//") Array[File] files3 = glob("output///") }