DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
264 stars 44 forks source link

Support real-time logging #41

Open parnurzeal opened 7 years ago

parnurzeal commented 7 years ago

I found it is hard to debug when something goes wrong. I have to wait for the job to finish and see the whole log file in GCS bucket. I would be nice to be able to see real-time log (maybe on stackdriver).

mbookman commented 7 years ago

This has been discussed in the past for the Pipelines API. I did not find an internal feature request to track this, so I have filed one.

kplaney commented 7 years ago

Echoing this sentiment - I was about to post this feature request. It's a slight tweak on the above, since you can see the dsub files your designated logging gcloud bucket as they are created already, but if the job fails, none of your intermediate files, including logs you may be writing out, will be pushed up to your bucket:

"Right now dsub seems like an either-or situation: if you are running a few tasks in your dsub script, all of which produce intermediate files you write to your OUTPUT_DIR, if task 3 fails, then the output files from tasks 1 and 2 are not pushed up, even if they are written OUTPUT_DIR. This includes log files written by the user to OUTPUT_DIR to further help in debugging code that worked locally but runs into some snags in the final dsub production pipeline.

It would be very helpful if there was at least an option to push up those intermediate files written to OUTPUT_DIR before the script failed to help with debugging (or amend the script to continue where the script stopped). Even better, perhaps allow the dsub instance to not die upon a script failure (obviously not the default setting) so that the user can log in and diagnose the problem in the production environment."