broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
988 stars 357 forks source link

WDL result folder has glob folder in Cromwell V50, different from V49 #5524

Open conanyangqun opened 4 years ago

conanyangqun commented 4 years ago

Thanks for your work with Cromwell. I think below may be not a bug. I write a workflow in wdl and use glob to output the result, like this.

task xxx {
    output {
        Array[File] output_files = glob("~{output_basename}_?P.fq.gz")
    }
}

xxx is a task name. I output the task result in the workflow output part. Besides, I using this options.json to change some workflow options.

{
    "final_workflow_outputs_dir": "xxx",
    "use_relative_output_paths": true
}

xxx is a absolute path. When I test it on a cromwell server V49 running in a SGE cluster, the os is Centos7.4, everything seems ok. xxx folder contains the task xxx's result. When I test it on a comwell server V50 running in the same SGE cluster, xxx folder contains a glob folder and the task result is in it. I want all the result files in xxx folder. Is it possible in V50? Thanks!

dformoso commented 3 years ago

Any ideas on how to solve this? I'm bumping against this same issue on v52

conanyangqun commented 3 years ago

Any ideas on how to solve this? I'm bumping against this same issue on v52

seems that this is the same on V66. No idea at all now.

kshakir commented 4 months ago

We also ran for a while and got extra glob folders in our paths via the workflow options:

{
    "final_workflow_outputs_dir": "xxx",
    "use_relative_output_paths": true
}

For anyone running their instances on a fork, or if someone wants to ask the Cromwell devs to see if this is a breaking change, on our instance I briefly tried out modifying this line:

lazy val truncateRegex = ".*/call-[^/]*/(shard-[0-9]+/)?(cacheCopy/)?(attempt-[0-9]+/)?(execution/)?".r

to:

lazy val truncateRegex = ".*/call-[^/]*/(shard-[0-9]+/)?(cacheCopy/)?(attempt-[0-9]+/)?(execution/)?(glob-[0-9a-f]+/)?".r

It seemed to work, removing the glob folder from files copied into xxx.

However, I ultimately pursued a different implementation. Using a customized external tool, we now only copy outputs reported as File or Directory by the /describe endpoint, which we are already using for validation.