Unloads data into s3 processed/batch/client/client_folder/...
Copies the data from processed/batch/client/client_folder/... into s3 client/client_folder/... (optionally adds the file extension while doing this)
If 2 is successful copies the data from processed/batch/client/client_folder/... into processed/good/client/client_folder/...
If 2 is unsuccessful, copies the data from processed/batch/client/client_folder/... into processed/bad/client/client_folder/...
Updated the configs to use more descriptive parameter names
Changes s3_to_sfts to:
Added the ability to specify a separate source and archive prefix
Split the single path parameters in the configs to separate source and archive paths
Changed the the archive paths so they are nor writing to the same archive paths that are used for redshift_to_s3.py
Testing these changes require modifications to the config files. You can see these changes by looking at the attached zip file to this ticket
Testing redshift_to_s3
Review its README to see if it makes sense or if anything needs to be changed
Log into the ec2 instance through the following commands
awsmfa prod <AWS OTP>
microservice_ssm
cd /home/microservice/branch/GDXDSD-5355-add-optional-file-extention-to-redshift_to_s3/redshift_to_s3
Run the following commands and compare their output to what's expected. Note that the commands for pmrp_qdata_range and sdpr_historical each take minutes to run
pipenv run python redshift_to_s3.py -c config.d/pmrp_all.json
pipenv run python redshift_to_s3.py -c config.d/pmrp_date_range.json
pipenv run python redshift_to_s3.py -c config.d/pmrp_max_date.json
pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json
pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_dates.json
pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_range.json
pipenv run python redshift_to_s3.py -c config.d/sdpr_historical.json
pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day.json
Objects to store: 1
Objects stored to s3 /client: 1
List of objects stored to S3 /client:
1: client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000.csv
Objects to process: 1
Objects processed to s3 /good: 1
List of objects processed to S3 /good:
1: processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000
4. Check to see if the files appear in the s3 batch bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false
5. Check to see if the files appear in the s3 client bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false
6. Check to see if the files appear in the s3 good bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/range/Jun_2022_change/&showversions=false
- sdpr_historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/theq_sdpr/historical/&showversions=false
- sdpr_last_full_day: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/theq_sdpr/daily/&showversions=false
Testing s3_to_sfts
1. Review its README to see if it makes sense or if anything needs to be changed
2. Navigate using the following command
cd /home/microservice/branch/GDXDSD-5355-add-optional-file-extention-to-redshift_to_s3/s3_to_sfts
3. Run the following commands and compare their output to what's expected. Note that the command for pmrp_qdata_range take minutes to run and that there are no commands for SDPR
pipenv run python s3_to_sfts.py -c config.d/pmrp_all.json
pipenv run python s3_to_sfts.py -c config.d/pmrp_date_range.json
pipenv run python s3_to_sfts.py -c config.d/pmrp_max_date.json
pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_daily.json
pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_dates.json
pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_range.json
4. Check to see if the files appear in the s3 processed good bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/google-mybusiness-sfts_sbc/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/range/Jun_2022_change/&showversions=false
5. Check to see if the files appear in SFTS
- pmrp_all: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_date_range: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_max_date: https://filetransfer.gov.bc.ca/human.aspx?r=1252781459&arg06=937431563&arg12=filelist
- pmrp_qdata_daily: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist
- pmrp_qdata_dates: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist
- pmrp_qdata_range: https://filetransfer.gov.bc.ca/human.aspx?r=681390743&arg06=937495448&arg12=filelist
This PR does the following:
Changes redshift_to_s3.py to:
Changes s3_to_sfts to:
Testing these changes require modifications to the config files. You can see these changes by looking at the attached zip file to this ticket
Testing redshift_to_s3
Report: redshift_to_s3.py
Config: config.d/pmrp_qdata_daily.json
DML: pmrp_qdata_daily.sql
Microservice started at: 2023-02-23 09:30:47-0800 (PST), ended at: 2023-02-23 09:30:48-0800 (PST), elapsing: 0:00:01.386101.
Objects loaded to S3 /batch: 1/1 Objects successfully loaded to S3 /batch: 1
List of objects successfully loaded to S3 /batch
Objects to store: 1 Objects stored to s3 /client: 1
List of objects stored to S3 /client: 1: client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000.csv
Objects to process: 1 Objects processed to s3 /good: 1
List of objects processed to S3 /good: 1: processed/good/client/doug_test/GDXDSD-5355/pmrp_qdata/daily/Jun_2022_change/pmrp_qdata_20230223T173047_part000
cd /home/microservice/branch/GDXDSD-5355-add-optional-file-extention-to-redshift_to_s3/s3_to_sfts
pipenv run python s3_to_sfts.py -c config.d/pmrp_all.json pipenv run python s3_to_sfts.py -c config.d/pmrp_date_range.json pipenv run python s3_to_sfts.py -c config.d/pmrp_max_date.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_daily.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_dates.json pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_range.json
Report: s3_to_sfts.py
Config: config.d/pmrp_qdata_range.json
Microservice started at: 2023-02-24 13:05:25-0800 (PST), ended at: 2023-02-24 13:07:18-0800 (PST), elapsing: 0:01:52.582032.
Items to process: 1 Objects successfully processed to s3: 1 Objects unsuccessfully processed to s3: 0 Objects successfully processed to sfts: 1
Objects loaded to S3 /good:
1: processed/good/client/doug_test/GDXDSD-5355/qdata-sfts_sbc/range/Jun_2022_change/pmrp_qdata_20180508_20220622_20230224T205455_part000