DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
264 stars 44 forks source link

Automatically create the output bucket #50

Open slagelwa opened 7 years ago

slagelwa commented 7 years ago

I'm sure its without its own set of problems, but it sure would be convenient to automatically make the output bucket if it doesn't already exist.

eap commented 7 years ago

Do you have specific examples of workflows where auto-creation of a bucket would be desirable? Can you re-phrase your bug in context of a specific pain point?

One big danger I see is in accidental bucket creation due to typos - as in the following example. The use case would need to be fairly compelling to balance the danger.

dsub --....lots of options.... \
     --output=gs://mybucket/pipeline/out/* \
     --output=gs://mubucket/pipeline/metadata/*.txt
slagelwa commented 7 years ago

I'm writing up an example of how to use dsub to run kallisto on some RNA-Seq sequencing data for ISB-CGC. The input files are coming from an ISB-CGC public bucket which users will be able to read but not write to. Therefore they'll have to specify their own output bucket. While its a minor point to have to include instructions on how to create an output bucket, I thought it would be convenient if it could be created automatically if it doesn't already exist. However the nuances of doing this are obviously a concern.

On Wed, Aug 9, 2017 at 11:14 AM, Evan Parker notifications@github.com wrote:

Do you have specific examples of workflows where auto-creation of a bucket would be desirable? Can you re-phrase your bug in context of a specific pain point?

One big danger I see is in accidental bucket creation due to typos - as in the following example. The use case would need to be fairly compelling to balance the danger.

dsub --....lots of options.... \ --output=gs://mybucket/pipeline/out/ \ --output=gs://mubucket/pipeline/metadata/.txt

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/googlegenomics/dsub/issues/50#issuecomment-321337950, or mute the thread https://github.com/notifications/unsubscribe-auth/AKBQyG6LVx2PKF7LQ5y3yXNhmi68JVceks5sWfcfgaJpZM4OxSFC .

-- Joe Slagel Institute for Systems Biology jslagel@systemsbiology.org