Closed soumilshah1995 closed 8 months ago
I've been able to get this to work by specifying in sparkSubmitParameters. For example:
job_run = client.start_job_run(
applicationId=application_id,
executionRoleArn=job_role_arn,
jobDriver={
'sparkSubmit': {
'entryPoint': 's3://bucket/your_script.py',
'entryPointArguments': [],
'sparkSubmitParameters': '--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory'
},
}
)
AWS has announced AWS EMR CLI
https://aws.amazon.com/blogs/big-data/build-deploy-and-run-spark-jobs-on-amazon-emr-with-the-open-source-emr-cli-tool/
I have tried and CLi works great simplifies submitting jobs
However, could you tell us how to enable the Glue Hive meta store when submitting a job via CLI or in Boto3 i have looked at documentation i don't see an argument for supplying use Glue CatLog option on boto3
Here is a sample of how we are submitting jobs qith EMR-CLI
emr run /emr_scripts/
--entry-point entrypoint.py
--application-id--job-role <arn>
--s3-code-uri s3://--spark-submit-opts "--conf spark.jars=/usr/lib/hudi/hudi-spark-bundle.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer"
--build ` --waitCreated A Github Issue https://github.com/awslabs/amazon-emr-cli/issues/18
If you can kindly get back to us on issue that would be great 😃