bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

Gdxdsd 5355 add optional file extention to redshift to s3 #164

Closed doughon closed 1 year ago

doughon commented 1 year ago

This PR does the following:

Changes redshift_to_s3.py to:

  1. Unloads data into s3 processed/batch/client/client_folder/...
  2. Copies the data from processed/batch/client/client_folder/... into s3 client/client_folder/... (optionally adds the file extension while doing this)
  3. If 2 is successful copies the data from processed/batch/client/client_folder/... into processed/good/client/client_folder/...
  4. If 2 is unsuccessful, copies the data from processed/batch/client/client_folder/... into processed/bad/client/client_folder/...
  5. Updated the configs to use more descriptive parameter names

Changes s3_to_sfts to:

  1. Added the ability to specify a separate source and archive prefix
  2. Split the single path parameters in the configs to separate source and archive paths
  3. Changed the the archive paths so they are nor writing to the same archive paths that are used for redshift_to_s3.py

Testing these changes require modifications to the config files. You can see these changes by looking at the attached zip file to this ticket

Testing redshift_to_s3

  1. Review its README to see if it makes sense or if anything needs to be changed
  2. Log into the ec2 instance through the following commands
    awsmfa prod <AWS OTP>
    microservice_ssm
    cd /home/microservice/branch/GDXDSD-5355-add-optional-file-extention-to-redshift_to_s3/redshift_to_s3
  3. Run the following commands and compare their output to what's expected. Note that the commands for pmrp_qdata_range and sdpr_historical each take minutes to run
    pipenv run python redshift_to_s3.py -c config.d/pmrp_all.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_date_range.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_max_date.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_dates.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_range.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_historical.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day.json
    
    ***The microservice ran successfully***

Report: redshift_to_s3.py

Config: config.d/pmrp_qdata_daily.json

DML: pmrp_qdata_daily.sql

Microservice started at: 2023-02-23 09:30:47-0800 (PST), ended at: 2023-02-23 09:30:48-0800 (PST), elapsing: 0:00:01.386101.

Objects loaded to S3 /batch: 1/1 Objects successfully loaded to S3 /batch: 1

List of objects successfully loaded to S3 /batch

  1. processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047

Objects to store: 1 Objects stored to s3 /client: 1

List of objects stored to S3 /client: 1: client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000.csv

Objects to process: 1 Objects processed to s3 /good: 1

List of objects processed to S3 /good: 1: processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000

4. Check to see if the files appear in the s3 batch bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false
5. Check to see if the files appear in the s3 client bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false
6. Check to see if the files appear in the s3 good bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false

Testing s3_to_sfts

1. Review its README to see if it makes sense or if anything needs to be changed
2. Navigate using the following command

cd /home/microservice/branch/GDXDSD-5355-add-optional-file-extention-to-redshift_to_s3/s3_to_sfts

3. Run the following commands and compare their output to what's expected. Note that the command for pmrp_qdata_range take minutes to run and that there are no commands for SDPR

pipenv run python s3_to_sfts.py -c config.d/pmrp_all.json pipenv run python s3_to_sfts.py -c config.d/pmrp_date_range.json pipenv run python s3_to_sfts.py -c config.d/pmrp_max_date.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_daily.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_dates.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_range.json

Report: s3_to_sfts.py

Config: config.d/pmrp_qdata_range.json

Microservice started at: 2023-02-24 13:05:25-0800 (PST), ended at: 2023-02-24 13:07:18-0800 (PST), elapsing: 0:01:52.582032.

Items to process: 1 Objects successfully processed to s3: 1 Objects unsuccessfully processed to s3: 0 Objects successfully processed to sfts: 1

Objects loaded to S3 /good:

1: processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/range/Jun_2022_change/pmrp_qdata_20180508_20220622_20230224T205455_part000


4. Check to see if the files appear in the s3 processed good bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/range/Jun_2022_change/&showversions=false
5. Check to see if the files appear in SFTS
- pmrp_all: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_date_range: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_max_date: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_qdata_daily: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist
- pmrp_qdata_dates: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist
- pmrp_qdata_range: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist