Auto-persisting stdout and stderr

zsusswein commented 3 months ago

Azure Batch writes stdout and stderr from each task to files on the node. These text files are available while the node is up, but otherwise are not persisted. It can be difficult to access the error messages when debugging, especially when proper logging or try/catch logic hasn't been set up yet.

It might be nice to borrow from @kaitejohnson's implementation here to automatically write stdout and stderr to files. It would take a little bit of wrangling/refactoring on the specification of the Docker command. It seems like it should be possible to do by modifying the docker command and requiring an output directory which could be the mounted blobfuse directory + a path here.

Alternatively @damonbayer noted that:

I didn't get around to testing it out, but I thought this looked like the "official" way to keep the logs: https://learn.microsoft.com/en-us/azure/batch/batch-task-output-files. I'm not sure if it is any easier/better than our method.

and @kaitejohnson mentioned:

I think defaulting to having something save to blob storage would be great especially for first time users! @dylanhmorris figured out how to do this even though the azure command is not in shell, as described here, which is why you need the bin/sh command at the beginning.

I don't think this would be a replacement for proper logging, but it would be a nice quality-of-life improvement when getting things set up.

I originally opened this issue on the deprecated internal repo and there's a bit of conversation there, but Nate recommended that I move it over here. Tagging in the people from that issue @natemcintosh @ryanraaschCDC @kaitejohnson @ChiragKumar9 @damonbayer @dylanhmorris

arik-shurygin commented 3 months ago

@zsusswein @kaitejohnson @natemcintosh I know this is a relatively old conversation but Scenarios has built an in-python solution to this by just routing sys.stderr and sys.stdout through python classes that mimic the interface, and write out to files as a side-effect. See: https://github.com/cdcent/cfa-scenarios-model/blob/84e3fa8f90d4de5236ed19e6855a9d0621c09b02/utils.py#L2475-L2518

for the classes themselves. This way you can just create these two objects at the top of your file and continue as normal. No need to modify anything about the node. This of course fails if you are doing non-python stuff

zsusswein commented 3 months ago

That's fun!

But yeah, that doesn't work for our model runs because we're using lots of R (and maybe Julia in the future?). I think we need a node-level solution, but I'd love to be wrong?

arik-shurygin commented 3 months ago

I think you are right. This feels like something that Azure must have had many requests for. Scenarios just built this solution since we knew our entire pipeline was in python so we would not run into any issues with it.

I am finding things like this: https://learn.microsoft.com/en-us/azure/batch/batch-task-output-files but it feels like azure is allergic to making a simple solution to something which is inherently pretty simple...

A workaround would be to wrap your R/Julia calls in a shell script which captures output and saves it as well)

zsusswein commented 3 months ago

Yeah that's approximately what @kaitejohnson did in her implementation. @damonbayer was also looking at your link and said it looked promising, but I think someone needs to spend some more time figuring out how to actually implement.

CDCgov / cfa_azure

Auto-persisting stdout and stderr #88