awslabs / amazon-emr-cli

A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Apache License 2.0
35 stars 12 forks source link

Add support for `--show-logs` in cluster mode on EMR on EC2 #12

Open dacort opened 1 year ago

dacort commented 1 year ago

With the recent --show-logs flag, we switch the deploy mode to client so that EMR steps can capture the driver stdout.

Unfortunately, --client mode doesn't work with additional archives provided via the --archives flag or --conf spark.archives parameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.

In order to support this for cluster mode, we'd need to parse the step stderr logs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.