coursera / dataduct

DataPipeline for humans.
Other
252 stars 83 forks source link

Environment variables support in pipeline definition #260

Closed penghou620 closed 7 years ago

penghou620 commented 7 years ago

Use environment variables in pipeline definition

Our initiatives to add this feature:

For example. We need to copy files from FTP to S3 but different buckets depends on the file type. We created an generic script to do the copy and control the destination bucket with environment variable.

-   step_type: transform
     name: copy_ftp_files
     script_name: copy_ftp_files.py
     script_directory: scripts
     script_arguments:
     -   --FTP_URL=<FTP_URL>
     -   --FTP_USERNAME=<FTP_USERNAME>
     -   --FTP_PASSWD=<FTP_PASSWD>
     -   --FILE_TYPE=<FILE_TYPE>
     -   --DEST_BUCKET=<DEST_BUCKET>
coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling db5fdaca610dcc07ab8e1509427e533e2e6d48a3 on loadsmart:develop into on coursera:develop.

seguschin commented 7 years ago

How that differ from doing getenv(args.var_name) in script directly (copy_ftp_files.py)?

penghou620 commented 7 years ago

@seguschin I think your solution is better. It's also more secure than the env-var approach. I'll close the PR. Thanks!