coursera / dataduct

DataPipeline for humans.
Other
252 stars 82 forks source link

add a default topic arn to the global config file #240

Open p5k6 opened 8 years ago

p5k6 commented 8 years ago

Currently there is no way to specify a default ARN in the config file (current documentation for creating an etl is incorrect in that regard).

Rather than update the existing documentation, I propose adding DEFAULT_TOPIC_ARN to the global config, which can be overridden in the etl pipeline itself. At least in our case, the same topic_arn is shared within the same mode.

I will add a PR to add this functionality.

zerowgravity commented 8 years ago

I'm having issues with IAM role permissions and the ability to setup SNS. Here is my stacktrace - raise ETLInputError('Pipeline has errors %s' % self.errors) dataduct.utils.exceptions.ETLInputError: Pipeline has errors [{u'errors': [u"Invalid role 'EMR_DefaultRole' in slot 'role'. Please make sure that the role exist and Data Pipeline has permission to assume the role."], u'id': u'SNSAlarm0@TransformStep0.S3Node0'}, {u'errors': [u"Invalid role 'EMR_DefaultRole' in slot 'role'. Please make sure that the role exist and Data Pipeline has permission to assume the role."], u'id': u'SNSAlarm0@TransformStep0.ShellCommandActivity0'}] Any help would be appreciated.

p5k6 commented 8 years ago

I think you need to go to IAM in AWS, assign the user that's configured for dataduct to a policy which has EMR_DefaultRole, such as AmazonElasticMapReduceRole. This can be done under the permissions tab for the specified user in IAM

zerowgravity commented 8 years ago

Thanks, I fixed it by adding a Default policy for the Amazon EC2 Role for Data Pipeline service role.