coursera / dataduct

DataPipeline for humans.
Other
252 stars 83 forks source link

Pip Install Missing Some Files #161

Closed warhammerkid closed 8 years ago

warhammerkid commented 8 years ago

I'm trying to use the create-load-redshift step and I get an exception when I try to create the pipeline:

Traceback (most recent call last):
  File "/usr/local/bin/dataduct", line 317, in <module>
    main()
  File "/usr/local/bin/dataduct", line 309, in main
    pipeline_actions(frequency_override=frequency_override, **arg_vars)
  File "/usr/local/bin/dataduct", line 80, in pipeline_actions
    activate_pipeline(etl)
  File "/Library/Python/2.7/site-packages/dataduct/utils/hook.py", line 66, in function_wrapper
    result = func(*new_args, **new_kwargs)
  File "/Library/Python/2.7/site-packages/dataduct/etl/etl_actions.py", line 82, in activate_pipeline
    etl.activate()
  File "/Library/Python/2.7/site-packages/dataduct/etl/etl_pipeline.py", line 645, in activate
    s3_file.upload_to_s3()
  File "/Library/Python/2.7/site-packages/dataduct/s3/s3_file.py", line 48, in upload_to_s3
    upload_to_s3(self._s3_path, self._path, self._text)
  File "/Library/Python/2.7/site-packages/dataduct/s3/utils.py", line 65, in upload_to_s3
    key.set_contents_from_filename(file_name)
  File "/Library/Python/2.7/site-packages/boto/s3/key.py", line 1358, in set_contents_from_filename
    with open(filename, 'rb') as fp:
IOError: [Errno 2] No such file or directory: '/Library/Python/2.7/site-packages/dataduct/steps/scripts/create_load_redshift_runner.py'

I downloaded the tar file that's up at https://pypi.python.org/pypi/dataduct/0.3.0 and it appears to be missing the steps/scripts folder entirely.

sb2nov commented 8 years ago

Sounds like a build issue. I'll fix this asap.

drelfi commented 8 years ago

Any news on this? The same happens with pipeline dependencies step. I've tried copying the scripts folder manually to the installation folder, but then the pipeline fails because dataduct is not installed in the EC2 instance. Should dataduct run in a particular AMI?

sb2nov commented 8 years ago

@drelfi I think you need to install dataduct on the Resource before. I'll create a public image soon