Closed dfsnow closed 2 months ago
One question I had related to error handling while I was reading this code: A few of the client methods are wrapped in
try
/except
blocks that prevent them from raising an exception, but what happens if a code block that is not protected in this way causes the job to error out before it can upload its logs to CloudWatch? How will we get alerted that the job has failed? I wonder if it's worth wrappingmain()
in a bigtry
/except
block that logs the exception, attempts to ship logs to CloudWatch, and alerts us before raising the exception.
So the way I'd set this up it wasn't possible to use the Spark logger in the except
, since it wouldn't exist if the main loop failed.
In the process of refactoring I discovered you totally can just use both the Spark AND Python loggers at the same time. So, I switched over to generic Python logging in lieu of passing around the Spark logger. This gets us:
I think it's a much better design overall, but curious to see what you think. Lots of changes here, so re-requesting review @jeancochrane!
This PR adds logging and error handling using the log4j driver pulled from the Spark session context. I chose to use this logger rather than standard Python logging because I want to capture the Spark output and intersperse it with Python logging.
This PR also adds a new
AWSClient
class that can trigger Glue jobs and upload finished log files to CloudWatch. I refactored the GitHub session class to more closely match the AWS one.Here's an example log output in CloudWatch.