Open zsusswein opened 3 months ago
@zsusswein @kaitejohnson @natemcintosh I know this is a relatively old conversation but Scenarios has built an in-python solution to this by just routing sys.stderr
and sys.stdout
through python classes that mimic the interface, and write out to files as a side-effect. See: https://github.com/cdcent/cfa-scenarios-model/blob/84e3fa8f90d4de5236ed19e6855a9d0621c09b02/utils.py#L2475-L2518
for the classes themselves. This way you can just create these two objects at the top of your file and continue as normal. No need to modify anything about the node. This of course fails if you are doing non-python stuff
That's fun!
But yeah, that doesn't work for our model runs because we're using lots of R (and maybe Julia in the future?). I think we need a node-level solution, but I'd love to be wrong?
I think you are right. This feels like something that Azure must have had many requests for. Scenarios just built this solution since we knew our entire pipeline was in python so we would not run into any issues with it.
I am finding things like this: https://learn.microsoft.com/en-us/azure/batch/batch-task-output-files but it feels like azure is allergic to making a simple solution to something which is inherently pretty simple...
A workaround would be to wrap your R/Julia calls in a shell script which captures output and saves it as well)
Yeah that's approximately what @kaitejohnson did in her implementation. @damonbayer was also looking at your link and said it looked promising, but I think someone needs to spend some more time figuring out how to actually implement.
Azure Batch writes stdout and stderr from each task to files on the node. These text files are available while the node is up, but otherwise are not persisted. It can be difficult to access the error messages when debugging, especially when proper logging or try/catch logic hasn't been set up yet.
It might be nice to borrow from @kaitejohnson's implementation here to automatically write stdout and stderr to files. It would take a little bit of wrangling/refactoring on the specification of the Docker command. It seems like it should be possible to do by modifying the docker command and requiring an output directory which could be the mounted blobfuse directory + a path here.
Alternatively @damonbayer noted that:
and @kaitejohnson mentioned:
I don't think this would be a replacement for proper logging, but it would be a nice quality-of-life improvement when getting things set up.
I originally opened this issue on the deprecated internal repo and there's a bit of conversation there, but Nate recommended that I move it over here. Tagging in the people from that issue @natemcintosh @ryanraaschCDC @kaitejohnson @ChiragKumar9 @damonbayer @dylanhmorris