coursera / dataduct

DataPipeline for humans.
Other
252 stars 83 forks source link

No module named dataduct.steps.executors.create_load_redshift error... #211

Closed donigian closed 8 years ago

donigian commented 8 years ago

I'm getting the following error after specifying the two steps below.

amazonaws.datapipeline.taskrunner.TaskExecutionException: Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named dataduct.steps.executors.create_load_redshift at amazonaws.datapipeline.activity.ShellCommandActivity.runActivity(ShellCommandActivity.java:93) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:745) 

Using this template:

name: example_sql_command
frequency: one-time
load_time: 01:00  # Hour:Min in UTC

description: Example for the sql_command step

steps:
-   step_type: extract-local
    path: data/test_db.csv
-   step_type: create-load-redshift
    table_definition: tables/lookup.test_dp2.sql

Using these configs ~/.dataduct/dataduct.cfg

redshift:
    CLUSTER_ID: xxx
    DATABASE_NAME: xxx
    HOST: xxx
    PASSWORD: xxx
    USERNAME: xxx
    PORT: 5439
logging:
    CONSOLE_DEBUG_LEVEL: INFO
    FILE_DEBUG_LEVEL: DEBUG
    LOG_DIR: ~/.dataduct
    LOG_FILE: dataduct.log
etl:
    REGION: us-east-1
    S3_ETL_BUCKET: xxx
    S3_BASE_PATH: xxx
    ROLE: DataPipelineDefaultRole
    RESOURCE_ROLE: DataPipelineDefaultResourceRole
mysql:
    host_alias_1:
        HOST: FILL_ME_IN
        PASSWORD: FILL_ME_IN
        USERNAME: FILL_ME_IN
ec2:
    INSTANCE_TYPE: m1.small
    ETL_AMI: ami-05355a6c
    SECURITY_GROUP_IDS: xxx
    SUBNET_ID: xxx
emr:
    MASTER_INSTANCE_TYPE: m1.large
    NUM_CORE_INSTANCES: 1
    CORE_INSTANCE_TYPE: m1.large
    CLUSTER_AMI: 3.1.0
AaronTorgerson commented 8 years ago

@donigian I am having the same problem. Turns out that somehow you have to have dataduct (and all its dependencies) installed on the EC2 instance that gets spun up to run your pipeline. I am going to try to solve this with a custom AMI.

cliu587 commented 8 years ago

closed. we currently do the custom AMI route.

pollosp commented 8 years ago

Hi, What should have a clean AMI to run datapipes?