toil-rnaseq does not recognize s3 bucket without subdirectory

BD2KGenomics / toil-scripts

Toil workflows for common genomic pipelines

Apache License 2.0

32 stars 18 forks source link

toil-rnaseq does not recognize s3 bucket without subdirectory #419

Closed Jeltje closed 7 years ago

Jeltje commented 8 years ago

I'm not sure this is a bug, precisely, but it may need some explanation.

When I give a s3 bucket as outputdir in the config file, the run will fail if you did not create a 'directory' inside the bucket. So this works: output-dir: s3://varscan-hg19-input/testout But this fails: output-dir: s3://varscan-hg19-input with The specified domain does not exist. full log

If this is expected behavior, then maybe the config should be a bit more explicit. It currently reads

# Required: Output location of sample. Can be full path to a directory or an s3:// URL
# Warning: S3 buckets must exist prior to upload or it will fail.

jvivian commented 8 years ago

@Jeltje The full log is missing the error message — is there another log?

jvivian commented 8 years ago

@Jeltje — Update on this?

jvivian commented 8 years ago

@JakeNarkizian — can you comment on this?

hannes-ucsc commented 8 years ago

Neither s3://varscan-hg19-input/testout nor s3://varscan-hg19-input should be accepted as valid directory URLs. They should always be enforced early to end in a slash.

hannes-ucsc commented 8 years ago

What I mean to say is that @Jeltje should try again with s3://varscan-hg19-input/ and toil-rnaseq should assert/require that input URLs pointing at a directory end in /.

jvivian commented 7 years ago

Hannes recognized the issue, in that lack of trailing slash reproduces the error. Toil-rnaseq has been updated to address this. This error is particularly insidious, as not enforcing the slash at the end may end with a run overwriting the same s3:// URL over and over again.