NAL-i5K / Organism_Onboarding

A workflow to make organism onboarding pipeline easy to handle as an I/O pipeline
4 stars 1 forks source link

LoadListingRequirement necessary for some tools #64

Closed mpoelchau closed 5 years ago

mpoelchau commented 5 years ago

It looks like some of our tools generate the following error:

createOrganism.cwl:45:3: Recursive directory listing has resulted in a large number of File objects
                         (15796) passed to the input parameter 'in_dir'.  This may negatively
                         affect workflow performance and memory use.

                         If this is a problem, use the hint 'cwltool:LoadListingRequirement' with
                         "shallow_listing" or "no_listing" to change the directory listing behavior:

                         $namespaces:
                           cwltool: "http://commonwl.org/cwltool#"
                         hints:
                           cwltool:LoadListingRequirement:
                             loadListing: shallow_listing

I was able to fix this in createOrganism.cwl by adding the following to the CommandLineTool:

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"

hints:
  cwltool:LoadListingRequirement:
    loadListing: shallow_listing

and then running it as follows (see https://www.biostars.org/p/361018/): cwl-runner --enable-ext createOrganism.cwl createorg.yml

I still need to figure out how to add the --enable-ext argument within the workflow (final-workflow.cwl). This will take me a bit of time - setting up this issue so I won't forget.

r06942072 commented 5 years ago

Related issue #57

mpoelchau commented 5 years ago

Running final workflow with --enable-ext works, e.g. cwl-runner --enable-ext final-workflow.cwl job-[gggsss].yml. Should update README.md