ENCODE-DCC / croo

Cromwell output organizer
MIT License
13 stars 3 forks source link

Array[File] type #39

Open RGBEN opened 3 years ago

RGBEN commented 3 years ago

Hi, I was wondering how to pass Array[File] type to croo (such as a folder of files). I noticed this has been brought up before but it's not entirely clear to me how this is achieved.

https://github.com/ENCODE-DCC/croo/issues/7

leepc12 commented 3 years ago

Please try this.

"path": "${i}/${basename}"

Then outputs in an array will be organized like

0/first_file.txt
1/second_file.txt
...
RGBEN commented 3 years ago

Sorry just got the chance to try it. I am getting this error in croo_out.log :

IsADirectoryError: [Errno 21] Is a directory

I should clarify that my intention is to pass one of the output directories (represented as an array of files in workflow level) to croo. Is this possible? or do each file has to be defined in wdl file and passed to croo one by one? (can be quite laborious... and need to know exactly what the outputs are)

e.g. toy wdl file

workflow wf {
   call analysis {
        input:
        some_input = fastq,
    }
  output {
    # folder contains multiple files of interest
    Array[File] output_dir = analysis.output_dir 
  }
}

task analysis {
  output {
    File output_dir = "target_output_folder_name"
  }
  command <<<
        run program_with_many_output_files
  >>>
}

Thanks for your help again!

leepc12 commented 3 years ago

Croo cannot organize directories. I think WDL/Cromwell doesn't support File as a directory since Cromwell localizes/delocalizes/hashes things on a file basis.

I think your toy WDL is not valid. analysis is not called in scatter so its output cannot be an Array[File].

You can use glob to get a list of globbed files instead of playing with directories. For example,


task t1 {
    ....
    output {
        Array[File] bigwigs = glob("*.bigwig")
    }
}
RGBEN commented 3 years ago

thanks this is helpful! Hope croo would consider to support directory (somehow) in the future.