Calenduck / Calenduck-DATA

0 stars 0 forks source link

원격으로 emr add step #6

Open GSJ7750 opened 1 year ago

GSJ7750 commented 1 year ago

https://github.com/Calenduck/Calenduck-DATA/issues/5 의 선행 티켓

GSJ7750 commented 1 year ago

import boto3

Create an EMR client

emr_client = boto3.client('emr', region_name='ap-northeast-2')

Define the step parameters

step_name = 'competition data ingestion' jar_path = 's3://competition-for-culture-data/jar/competitionPipelines.jar' main_class = 'com.competition.saveNameWithMt20IDJob' step_args = ['--class', main_class, jar_path] # Additional arguments for spark-submit

Create the step configuration

step = { 'Name': step_name, 'ActionOnFailure': 'CONTINUE', 'HadoopJarStep': { 'Jar': 'command-runner.jar', 'Args': ['spark-submit'] + step_args } }

Add the step to the EMR cluster

response = emr_client.add_job_flow_steps( JobFlowId='j-3AC8Z96GV7K7I', Steps=[step] )

Check the response for any errors

if 'ResponseMetadata' in response: print('Step added successfully!') else: print('Error occurred while adding step:', response)

GSJ7750 commented 1 year ago

timezone: Asia/Seoul

schedule: cron>: 10 1 *

+ingestion: sh>: python3 /home/ec2-user/add_ingestion_step.py

GSJ7750 commented 1 year ago

Image

GSJ7750 commented 1 year ago

Image