coursera / dataduct

DataPipeline for humans.
Other
252 stars 83 forks source link

getting an error in extract-postgres step type #218

Closed ScottWang closed 8 years ago

ScottWang commented 8 years ago

Did anyone encounter this problem before? I am just trying to use the example_extract_postgres.yaml and getting the following error.

[INFO]: Pipeline scheduled to start at 2016-02-11T01:00:00 Traceback (most recent call last): File "./dataduct", line 347, in main() File "./dataduct", line 337, in main pipeline_actions(frequency_override=frequency_override, **arg_vars) File "./dataduct", line 80, in pipeline_actions frequency_override, backfill): File "./dataduct", line 55, in initialize_etl_objects etls.append(create_pipeline(definition)) File "/Users/scotwang/GitSrc/third_party/dataduct/testdataduct/lib/python2.7/site-packages/dataduct/etl/etl_actions.py", line 55, in create_pipeline etl.create_steps(steps) File "/Users/scotwang/GitSrc/third_party/dataduct/testdataduct/lib/python2.7/site-packages/dataduct/etl/etl_pipeline.py", line 451, in create_steps steps_params = process_steps(steps_params) File "/Users/scotwang/GitSrc/third_party/dataduct/testdataduct/lib/python2.7/site-packages/dataduct/etl/utils.py", line 68, in process_steps params['step_class'] = STEP_CONFIG[step_type] KeyError: 'extract-postgres'

ScottWang commented 8 years ago

I think I figure it out. The pip install does not have the extract-postgres module so I check out repo and install the package, then the above error goes away.

But I am getting different error:

~/Downloads/dataduct$ dataduct pipeline validate ~/GitSrc/third_party/dataduct/examples/test_postgres.yaml [INFO]: Pipeline scheduled to start at 2016-02-11T00:55:00 [ERROR]: Error creating step of class ExtractPostgresStep, step_param {'worker_group': 'testdpwg', 'schedule': <dataduct.pipeline.schedule.Schedule object at 0x1049f3dd0>, 'max_retries': 0, 'sql': 'select id from associations limit 10\n', 's3_data_dir': <dataduct.s3.s3_path.S3Path object at 0x1049f3f90>, 'required_steps': [], 's3_source_dir': <dataduct.s3.s3_path.S3Path object at 0x1049f3f50>, 'resource': <dataduct.pipeline.ec2_resource.Ec2Resource object at 0x1049f3e10>, 'id': 'ExtractPostgresStep0', 's3_log_dir': <dataduct.s3.s3_log_path.S3LogPath object at 0x1049f3fd0>, 'output_path': 's3://discovery-import/datapipeline/test/one.txt'} Traceback (most recent call last): File "/usr/local/bin/dataduct", line 5, in pkg_resources.run_script('dataduct==0.5.0', 'dataduct') File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 492, in run_script self.require(requires)[0].run_script(script_name, ns) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 1357, in runscript exec(script_code, namespace, namespace) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkgresources.py", line 47, in exec exec("""exec code in globs, locs""") File "", line 1, in File "/Library/Python/2.7/site-packages/dataduct-0.5.0-py2.7.egg/EGG-INFO/scripts/dataduct", line 347, in

File "/Library/Python/2.7/site-packages/dataduct-0.5.0-py2.7.egg/EGG-INFO/scripts/dataduct", line 337, in main

File "/Library/Python/2.7/site-packages/dataduct-0.5.0-py2.7.egg/EGG-INFO/scripts/dataduct", line 80, in pipeline_actions

File "/Library/Python/2.7/site-packages/dataduct-0.5.0-py2.7.egg/EGG-INFO/scripts/dataduct", line 55, in initialize_etl_objects

File "build/bdist.macosx-10.10-intel/egg/dataduct/etl/etl_actions.py", line 55, in create_pipeline File "build/bdist.macosx-10.10-intel/egg/dataduct/etl/etl_pipeline.py", line 490, in create_steps File "build/bdist.macosx-10.10-intel/egg/dataduct/steps/extract_postgres.py", line 73, in init File "build/bdist.macosx-10.10-intel/egg/dataduct/steps/etl_step.py", line 150, in create_pipeline_object TypeError: init() got an unexpected keyword argument 'sql'

sb2nov commented 8 years ago

@ScottWang let me take a look at upgrading the pip version and seeing what is going on.

ScottWang commented 8 years ago

Thank you very much. Do you have any update on this?