coursera / dataduct

DataPipeline for humans.
Other
252 stars 84 forks source link

S3_ETL_BUCKET default and mode values being concatenated #264

Open JeremyColton opened 7 years ago

JeremyColton commented 7 years ago

Hi,

Great product!

I see an issue though - I want to specify mode-specific config eg S3_ETL_BUCKET. If I don't specify this key in the default config, when creating my pipeline with:

dataduct pipeline validate -m production -f test2.yaml

I see the error: "KeyError: 'S3_ETL_BUCKET'. So I must include it eg: etl: S3_ETL_BUCKET: ABC

But the value ABC is then joined to the mode-specific value below: production: etl: S3_BASE_PATH: prod

and the following value is seen in my Pipeline for eg logging: s3://ABC/prod/logs/jeremy_example_upsert/version_20170605153433

This is a bug. Your docs say "Modes define override settings for running a pipeline" but it joins them together instead.

Thanks so much and please help...

JeremyColton commented 7 years ago

The code fix is to edit this file: dataduct/etl/etl_pipeline.py

Line 39, change from: S3_ETL_BUCKET = config.etl['S3_ETL_BUCKET']

to: S3_ETL_BUCKET = config.etl.get('S3_ETL_BUCKET', const.EMPTY_STR)

This allows the default S3_ETL_BUCKET to be empty.