bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

GDXDSD-5429 sdpr incremental update and hourly data - In draft waiting on SDPR #136

Closed doughon closed 1 year ago

doughon commented 2 years ago

This PR does the following:

  1. Adds a config file and DML to generate a daily SDPR feed that includes records that were recently updated but were created on a previous day
  2. Adds a config file and DML to generate a hourly SDPR feed that includes records from the start of the day up to the top of the hour that the microservice is run
  3. Updates redshift_to_s3.py to support different delimiters that are set in the config, with it defaulting to a | (pipe) if no delimiter is specified

To quickly review the changes:

  1. For the daily incremental data, review yesterday's file in theq_sdpr/daily/Oct2022_incremental_update s3 bucket and compare them with the same date file in the old s3 bucket. You should notice that there is an additional "update_flag" column in the incremental update file, and that the incremental update file should at least contain all of the data that was in the old file. If there were any recently updated data from a previous day, these should appear at the top of the incremental update file, and be highlighted with an "update_flag" column value of 1.
  2. For the hourly data, review the files in theq_sdpr/hourly/Oct2022_incremental_update s3 bucket. Look at the files last modified date and check to see that the file doesn't have any data that takes place before the top of that last modified hour

Testing these changes requires some modifications to the existing configs so that there are no affects to the production system:

Log into microservice_ssm

awsmfa prod <AWS OTP>
microservice_ssm

Navigate to the development branch  cd branch/GDXDSD-5053-SDPR-incremental-update/sfts/

Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex. "directory": "doug_test",

Manually run the microservice and inspect the generated file to test the SDPR daily incremental change. It should contain an "update_flag" column and should only include records created yesterday, or records from previous days that were updated yesterday pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day_incremental.json

Modify the config.d/sdpr_last_full_day_incremental.json "directory" value to be written to a different s3 bucket, ex. "directory": "doug_test",

Manually run the microservice and inspect the generated file to test the SDPR hourly change. It should only contain data that was created today up to the top of the hour that the microservice was run pipenv run python redshift_to_s3.py -c config.d/sdpr_hourly.json

Modify an unrelated config file "directory" value to be written to a different s3 bucket. ex for config.d/pmrp_qdata_daily.json  "directory": "doug_test",

Manually run the microservice and inspect the generated file to test the delimiter change. The file should be | (pipe) delimited pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json

Undo the changes you made for testing with the following command git restore .