bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

Gdxdsd 5611 add quotes around text option to redshift to s3 #196

Closed doughon closed 1 year ago

doughon commented 1 year ago

This PR does the following:

  1. Adds redshift_to_s3 config option to set ADDQUOTES in the UNLOAD, and default to enabled
  2. Adds redshift_to_s3 config option to set ESCAPE in the UNLOAD, and default to disabled
  3. Update existing configs to use the options that were set before these changes

Testing notes:

There will three sections to the testing

  1. Reviewing the changes to the file
  2. Testing the functionality of the ADDQUOTES and ESCAPE options
  3. Checking to see that the current data is produced with the same settings as before these changes

These tests will use modified configs that do not appear in the PR. You can view these modified test configs in the ticket

Testing instructions:

  1. Review README.md for understanding and clarity
  2. Review the changes in redshift_to_s3.py:
  3. Review the configs in config.d to make sure "addquotes" is set to False and that "escape" is missing (thereby defaulting to false):
  4. Compare the configs in config.d with the test configs in the ticket, making sure that the only change being that 'doug_test/GDXDSD-5611/' is put at the start of the "directory" option
  5. Log into the ec2 instance through the following commands
    awsmfa prod <AWS OTP>
    microservice_ssm
    cd /home/microservice/branch/GDXDSD-5611-add-quotes-around-text-option-to-redshift-to-s3/redshift_to_s3
  6. Run the following commands and compare its output to what's expected:
    pipenv run python redshift_to_s3.py -c config.d/GDXDSD-5611-default.json && pipenv run python redshift_to_s3.py -c config.d/GDXDSD-5611-yes-addquotes-no-escape.json && pipenv run python redshift_to_s3.py -c config.d/GDXDSD-5611-no-addquotes-yes-escape.json && pipenv run python redshift_to_s3.py -c config.d/GDXDSD-5611-yes-addquotes-yes-escape.json && pipenv run python redshift_to_s3.py -c config.d/GDXDSD-5611-no-addquotes-no-escape.json
    
    ***The microservice ran successfully***

Report: redshift_to_s3.py

Config: config.d/GDXDSD-5611-default.json

DML: GDXDSD-5611.sql

Microservice started at: 2023-07-31 16:30:29-0700 (PDT), ended at: 2023-07-31 16:30:30-0700 (PDT), elapsing: 0:00:01.331193.

Objects loaded to S3 /batch: 1/1 Objects successfully loaded to S3 /batch: 1

List of objects successfully loaded to S3 /batch

  1. processed/batch/client/doug_test/GDXDSD-5611/default/GDXDSD-5611_default_20230731T233029

Objects to store: 1 Objects stored to s3 /client: 1

List of objects stored to S3 /client: 1: client/doug_test/GDXDSD-5611/default/GDXDSD-5611_default_20230731T233029_part000.csv

Objects to process: 1 Objects processed to s3 /good: 1

List of objects processed to S3 /good: 1: processed/good/client/doug_test/GDXDSD-5611/default/GDXDSD-5611_default_20230731T233029_part000

7. Check to see if the files appear in the s3 processed good bucket: 
- default settings: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/default/&showversions=false
- ADDQUOTES disabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/no-addquotes-no-escape/&showversions=false
- ADDQUOTES disabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/no-addquotes-yes-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/yes-addquotes-no-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/yes-addquotes-yes-escape/&showversions=false
8. Check to see if the files appear in the s3 processed batch bucket:
- default settings: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/default/&showversions=false
- ADDQUOTES disabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/no-addquotes-no-escape/&showversions=false
- ADDQUOTES disabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/no-addquotes-yes-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/yes-addquotes-no-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/yes-addquotes-yes-escape/&showversions=false
9. Check to see if the files appear in the s3 client bucket and download your generated file: 
- default settings: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/default/&showversions=false
- ADDQUOTES disabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/no-addquotes-no-escape/&showversions=false
- ADDQUOTES disabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/no-addquotes-yes-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE disabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/yes-addquotes-no-escape/&showversions=false
- ADDQUOTES enabled, ESCAPE enabled: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/yes-addquotes-yes-escape/&showversions=false
10. Run the following commands and compare its output to what's expected:

pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day-test.json pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day_incremental-test.json pipenv run python redshift_to_s3.py -c config.d/sdpr_hourly-test.json pipenv run python redshift_to_s3.py -c config.d/sbc-sdpr_last_full_day_incremental-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_dates-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_all-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_range-test.json pipenv run python redshift_to_s3.py -c config.d/sbc-sdpr_historical-test.json pipenv run python redshift_to_s3.py -c config.d/sdpr_historical-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_date_range-test.json pipenv run python redshift_to_s3.py -c config.d/pmrp_max_date-test.json


11. Check to see if the files appear in the s3 processed good bucket: 
- sdpr_last_full_day-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdpr/daily/&showversions=false
- sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- sdpr_hourly-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
- sbc-sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_daily-incremental/v03-May_2023_folder_change/&showversions=false
- pmrp_qdata_dates-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_daily-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_all-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_qdata_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sbc-sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_historical/v03-May_2023_folder_change/&showversions=false
- sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/theq_sdpr/historical/&showversions=false
- pmrp_date_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_max_date/&showversions=false
12. Check to see if the files appear in the s3 processed batch bucket: 
- sdpr_last_full_day-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdpr/daily/&showversions=false
- sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- sdpr_hourly-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
- sbc-sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_daily-incremental/v03-May_2023_folder_change/&showversions=false
- pmrp_qdata_dates-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_daily-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_all-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_qdata_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sbc-sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_historical/v03-May_2023_folder_change/&showversions=false
- sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/theq_sdpr/historical/&showversions=false
- pmrp_date_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_max_date/&showversions=false
13. Check to see if the files appear in the s3 client bucket and download both files to compare and see if the file formats are the same: 
- sdpr_last_full_day-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdpr/daily/&showversions=false
- sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- sdpr_hourly-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
- sbc-sdpr_last_full_day_incremental-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_daily-incremental/v03-May_2023_folder_change/&showversions=false
- pmrp_qdata_dates-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_daily-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_all-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_qdata_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sbc-sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdpr/sbc-sdpr_historical/v03-May_2023_folder_change/&showversions=false
- sdpr_historical-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/theq_sdpr/historical/&showversions=false
- pmrp_date_range-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date-test: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5611/pmrp_gdx/pmrp_max_date/&showversions=false