Open sereeena opened 2 years ago
Hi @sereeena!
Thanks for the suggestion. As you indicated, this seems like a very natural extension of the capabilities provided for the logging path.
Is the job_id
the specific field you are interested in? Just want to make sure that the feature request, when implemented, would fulfill your use case.
Thanks!
Yes, I thought it would be useful to automatically keep outputs generated by each job, linked to job_id.
Use case is more for debugging your pipeline, if you were running it multiple times with the same input for example, you could easily keep the outputs across each run. This is what I was doing, but then I did think that in practice, in production, you probably would be creating a new bucket, putting your input file in there and then using that bucket as the output. So I'm not sure if this is enough reason to implement it? But since logging does it, I thought it might be easier to add..
Thanks
On Tue, 5 Apr 2022 at 03:09, Matt Bookman @.***> wrote:
Hi @sereeena https://github.com/sereeena!
Thanks for the suggestion. As you indicated, this seems like a very natural extension of the capabilities provided for the logging path.
Is the job_id the specific field you are interested in? Just want to make sure that the feature request, when implemented, would fulfill your use case.
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/DataBiosphere/dsub/issues/237#issuecomment-1087833230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHDT57DHTGXQ45E6ZGZ7RLVDMSORANCNFSM5SHMXI3A . You are receiving this because you were mentioned.Message ID: @.***>
Currently you can specify --output-recursive OUTPUT_PATH=gs://bucket/path to have your job write output files and subdirectories to ${OUTPUT_PATH} and these files will be copied to the specified bucket/path. But it would be very useful to be able to automatically set the path in the bucket to the job_id when calling dsub.
This is already available for --logging, where you can format the filenames of the logfiles using variable substitution and by default the files are tied to the job_id.
Can we have something like --output-recursive OUTPUT_PATH=gs://bucket/{job_id}?