With the recent --show-logs flag, we switch the deploy mode to client so that EMR steps can capture the driver stdout.
Unfortunately, --client mode doesn't work with additional archives provided via the --archives flag or --conf spark.archives parameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.
In order to support this for cluster mode, we'd need to parse the step stderr logs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.
With the recent
--show-logs
flag, we switch the deploy mode toclient
so that EMR steps can capture the driverstdout
.Unfortunately,
--client
mode doesn't work with additional archives provided via the--archives
flag or--conf spark.archives
parameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.In order to support this for cluster mode, we'd need to parse the step
stderr
logs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.