Closed vortexing closed 4 years ago
running into the same issue as well.
java -Dconfig.file=cromwell-aws.conf -jar cromwell-42.jar run file_copy_test.wdl --options cromwell-options.json
Tried with V40,41,42 and same issues..
Quick test script hacked together from other scripts:
file_copy_test.wdl
workflow WGS_BAM_to_GVCF {
String input_file = "s3://bucket/file"
# Merge per-interval GVCFs
call MergeGVCFs {
input:
input_file = input_file
}
# Outputs that will be retained when execution is complete
output {
File output_vcf = MergeGVCFs.output_vcf
}
}
#### TASKS ####
# Merge GVCFs generated per-interval for the same sample
task MergeGVCFs {
File input_file
String output_file_name = "output.txt"
Int machine_mem_gb = 2
Int command_mem_gb = machine_mem_gb - 1
command {
echo ${input_file} > ${output_file_name}
}
runtime {
docker: "ubuntu"
memory: "${machine_mem_gb}G"
cpu: 1
}
output {
File output_vcf = "${output_file_name}"
}
}
cromwell_options.json
{
"final_workflow_outputs_dir":"s3://3-bucket",
"use_relative_output_paths":true,
"final_workflow_log_dir":"s3://s3-bucket/wf_logs"
}
error:
[2019-06-15 19:50:15,63] [error] WorkflowManagerActor Workflow c9dd69e1-121e-45bc-911f-92d6bb6a2074 failed (during FinalizingWorkflowState): cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IllegalArgumentException: copying directories is not yet supported: s3://s3bucket/WGS_BAM_to_GVCF/c9dd69e1-121e-45bc-911f-92d6bb6a2074/call-MergeGVCFs/output.txt
Caused by: java.lang.IllegalArgumentException: copying directories is not yet supported: s3://s3bucket/c9dd69e1-121e-45bc-911f-92d6bb6a2074/call-MergeGVCFs/output.txt
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at org.lerch.s3fs.S3FileSystemProvider.copy(S3FileSystemProvider.java:420)
at java.nio.file.Files.copy(Files.java:1274)
at better.files.File.copyTo(File.scala:663)
at cromwell.core.path.BetterFileMethods.copyTo(BetterFileMethods.scala:425)
at cromwell.core.path.BetterFileMethods.copyTo$(BetterFileMethods.scala:424)
at cromwell.filesystems.s3.S3Path.copyTo(S3PathBuilder.scala:160)
at cromwell.engine.io.nio.NioFlow.$anonfun$copy$1(NioFlow.scala:84)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:351)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:372)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:312)
at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Root cause is from the s3fs.S3FileSystemProvider
library. Support for copying directories appears to have been implemented in this fork.
ha.... so Lee, we should just use Nextflow eh? :)
That was the only version of the library I could quickly find that had support for directory copying. It's worth looking into what changes were necessary.
A little more research suggest that a TransferManager
:
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/services/s3/transfer/TransferManager.html
Might be what is needed to support copying of both files and directories in this method: https://github.com/broadinstitute/cromwell/blob/19d86f07d5618760e420cc218c7ff7749711d6ba/filesystems/s3/src/main/java/org/lerch/s3fs/S3FileSystemProvider.java#L411
@vortexing I'm currently working on this issue. Can you please provide your full WDL and JSON files. And what is the aim of your workflow? I need to make sure that this is not the issue with path building in the cromwell methods. Thx.
Did you try the test WDL and json provided by robthompsonweb? Any of my workflows have this same error and they all have essentially that same basic structure.
@robthompsonweb as i understand, you expect the output.txt to be a file(not a directory) located at "s3://3-bucket/WGS_BAM_to_GVCF/workflow/call-MergeGVCFs/output.txt/ and logs at s3://s3-bucket/wf_logs. Am i right?
@TimurKustov correct (apart from the trailing slash on output.txt.. not sure if that is a typo but that will break it on s3. it would copy output.txt to s3://3-bucket/WGS_BAM_to_GVCF/workflow/call-MergeGVCFs/output.txt
.
ie aws s3 cp localpath/WGS_BAM_to_GVCF/workflow/call-MergeGVCFs/output.txt s3://3-bucket/WGS_BAM_to_GVCF/workflow/call-MergeGVCFs/output.txt
I would expect it to copy every file from the output into the path specified by final_workflow_outputs_dir.
if use_relative_output_paths
is false
it would be:
s3://3-bucket/output.txt
We are ready to submit PR with fix for this issue and we performed manual testing on several backends (AWS, GCP, Local), but there are some troubles with creation of integration test for that: in particular, we didn't find the way to pass cromwell options to cromwell running in server mode. Is this possible and is integration test required for this issue? @wleepang
@TimurKustov You just need to specify it in the centaur test description of , with a pointer to where the option file lives
@geoffjentry Could you also please help with another question about *.test
files: is it possible to run multiple separate workflows (wdl or cwl) in one test? Not importing as sub-workflows, but as a completely independent workflows?
Thanks!
@likeanowl Not really. What are you looking to do here?
@geoffjentry I trying to write some kind of integration test for my fix of this task and me with @TimurKustov came to idea of executing two workflows sequential in order to get outputs, results and call logs copied after execution of the first workflow and assure that they are exist and correctly placed by running second workflow, which would check these files locations and existence.
@wleepang I've added TransferManager and some other improvements in PR https://github.com/broadinstitute/cromwell/pull/5110 Could you please review it?
@wleepang @geoffjentry Any updates about PR #5110 ? :smile:
@likeanowl - Took a look at the PR. Overall, looks good, but had a couple questions. Do the new integration tests you mention cover the points I brought up - i.e. mostly around default credentials use and default region config?
I'm getting the same error when using
output {
Array[File] files = glob("*.txt")
}
PR #5110 has merged, closing
Ran a workflow using V40 Cromwell on AWSBATCH that had as outputs (one outputfile.vcf for each of the shards in the workflow):
I used the following workflow options:
All calls of the workflow completed successfully but the workflow itself failed.
Error Message I got:
"copying directories is not yet supported: s3://s3.amazonaws.com/bucket/Cromwell/results/workflowName/1ec38d0b-afc4-4cd5-90f1-f015395d6e36/call-task/shard-0/outputfile.vcf"
Oddly enough, the correct prefixes for the output files were created in the correct S3 bucket, they just don't have an object there, and via the CLI they appear as directories. ???
For the logs, a prefix was made that is empty, and the log file was written successfully to one level higher than the prefix it is supposed to be in. So instead of:
there is:
Thoughts?