EMR Serverless Adding Option to Boto3 for Glue Catlog

aws-samples / emr-serverless-samples

Example code for running Spark and Hive jobs on EMR Serverless.

MIT No Attribution

155 stars 78 forks source link

AWS has announced AWS EMR CLI

https://aws.amazon.com/blogs/big-data/build-deploy-and-run-spark-jobs-on-amazon-emr-with-the-open-source-emr-cli-tool/

I have tried and CLi works great simplifies submitting jobs

However, could you tell us how to enable the Glue Hive meta store when submitting a job via CLI or in Boto3 i have looked at documentation i don't see an argument for supplying use Glue CatLog option on boto3

Here is a sample of how we are submitting jobs qith EMR-CLI

emr run --entry-point entrypoint.py --application-id --job-role <arn> --s3-code-uri s3:///emr_scripts/ --spark-submit-opts "--conf spark.jars=/usr/lib/hudi/hudi-spark-bundle.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer" --build ` --wait

Created A Github Issue https://github.com/awslabs/amazon-emr-cli/issues/18

If you can kindly get back to us on issue that would be great 😃

job_run = client.start_job_run( applicationId=application_id, executionRoleArn=job_role_arn, jobDriver={ 'sparkSubmit': { 'entryPoint': 's3://bucket/your_script.py', 'entryPointArguments': [], 'sparkSubmitParameters': '--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory' }, } )

aws-samples / emr-serverless-samples

EMR Serverless Adding Option to Boto3 for Glue Catlog #53