Adds a config file and DML to generate a daily SDPR feed that includes records that were recently updated but were created on a previous day
Adds a config file and DML to generate a hourly SDPR feed that includes records from the start of the day up to the top of the hour that the microservice is run
Updates redshift_to_s3.py to support different delimiters that are set in the config, with it defaulting to a | (pipe) if no delimiter is specified
To quickly review the changes:
For the daily incremental data, review yesterday's file in theq_sdpr/daily/Oct2022_incremental_update s3 bucket and compare them with the same date file in the old s3 bucket. You should notice that there is an additional "update_flag" column in the incremental update file, and that the incremental update file should at least contain all of the data that was in the old file. If there were any recently updated data from a previous day, these should appear at the top of the incremental update file, and be highlighted with an "update_flag" column value of 1.
For the hourly data, review the files in theq_sdpr/hourly/Oct2022_incremental_update s3 bucket. Look at the files last modified date and check to see that the file doesn't have any data that takes place before the top of that last modified hour
Testing these changes requires some modifications to the existing configs so that there are no affects to the production system:
Log into microservice_ssm
awsmfa prod <AWS OTP>
microservice_ssm
Navigate to the development branch
cd branch/GDXDSD-5053-SDPR-incremental-update/sfts/
Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex.
"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the SDPR daily incremental change. It should contain an "update_flag" column and should only include records created yesterday, or records from previous days that were updated yesterday
pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day_incremental.json
Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex.
"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the SDPR hourly change. It should only contain data that was created today up to the top of the hour that the microservice was run
pipenv run python redshift_to_s3.py -c config.d/sdpr_hourly.json
Modify an unrelated config file "directory" value to be written to a different s3 bucket. ex for config.d/pmrp_qdata_daily.json"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the delimiter change. The file should be | (pipe) delimited
pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json
Undo the changes you made for testing with the following command
git restore .
This PR does the following:
To quickly review the changes:
Testing these changes requires some modifications to the existing configs so that there are no affects to the production system:
Log into microservice_ssm
Navigate to the development branch
cd branch/GDXDSD-5053-SDPR-incremental-update/sfts/
Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex.
"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the SDPR daily incremental change. It should contain an "update_flag" column and should only include records created yesterday, or records from previous days that were updated yesterday
pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day_incremental.json
Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex.
"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the SDPR hourly change. It should only contain data that was created today up to the top of the hour that the microservice was run
pipenv run python redshift_to_s3.py -c config.d/sdpr_hourly.json
Modify an unrelated config file "directory" value to be written to a different s3 bucket. ex for config.d/pmrp_qdata_daily.json
"directory": "doug_test",
Manually run the microservice and inspect the generated file to test the delimiter change. The file should be | (pipe) delimited
pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json
Undo the changes you made for testing with the following command
git restore .