bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

Gdxdsd 5429 sdpr incremental update #168

Closed doughon closed 1 year ago

doughon commented 1 year ago

This PR does the following:

  1. Adds a SDPR daily incremental feed of data, that has an additional 'update_flag' column that is 1 if a record has been updated on a day after the day it was created and a 0 otherwise. This data will be stored in https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
  2. Adds a SDPR hourly feed of data, that has data from the start of the day up to the hour that the job is ran. This data will be stored in https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
  3. Adds the ability to specify how the data is delimited.
  4. In redshift_to_s3.py, adds a trailing / to the end of the prefix used to determine the s3 objects that need to be processed

Please note that some of the config file were modified so that the tests can run. The modified configs will be added to the ticket in a zip file.

Testing instructions:

  1. Review the new DMLs sdpr_last_full_day_incremental.sql and sdpr_hourly.sql.
  2. Review the redshift_to_s3 README.md to check the updated delimiter info and to catch any errors
  3. Log into the ec2 instance through the following commands
    awsmfa prod <AWS OTP>
    microservice_ssm
    cd /home/microservice/branch/GDXDSD-5429-SDPR-incremental-update/redshift_to_s3
  4. Run the following command and compare its output to what's expected. (Please note that the sdpr_historical command will take minutes to complete)
    pipenv run python redshift_to_s3.py -c config.d/sdpr_historical.json 
    pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day_incremental.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_hourly.json
    
    ***The microservice ran successfully***

Report: redshift_to_s3.py

Config: config.d/sdpr_historical.json

DML: sdpr_historical.sql

Microservice started at: 2023-03-06 09:25:20-0800 (PST), ended at: 2023-03-06 09:25:58-0800 (PST), elapsing: 0:00:38.346422.

Objects loaded to S3 /batch: 1/1 Objects successfully loaded to S3 /batch: 1

List of objects successfully loaded to S3 /batch

  1. processed/batch/client/doug_test/GDXDSD-5429/theq_sdpr/historical/sdpr_20230306T172520

Objects to store: 1 Objects stored to s3 /client: 1

List of objects stored to S3 /client: 1: client/doug_test/GDXDSD-5429/theq_sdpr/historical/sdpr_20230306T172520_part000

Objects to process: 1 Objects processed to s3 /good: 1

List of objects processed to S3 /good: 1: processed/good/client/doug_test/GDXDSD-5429/theq_sdpr/historical/sdpr_20230306T172520_part000


5. Check to see if the file appear in the s3 processed batch bucket: 
- SDPR Historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5429/theq_sdpr/historical/&showversions=false
- SDPR Daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5429/theq_sdpr/daily/&showversions=false
- SDPR Daily Incremental: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5429/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- SDPR Hourly: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/batch/client/doug_test/GDXDSD-5429/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
6. Check to see if the file appear in the s3 client bucket: 
- SDPR Historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5429/theq_sdpr/historical/&showversions=false
- SDPR Daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5429/theq_sdpr/daily/&showversions=false
- SDPR Daily Incremental: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5429/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- SDPR Hourly: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=client/doug_test/GDXDSD-5429/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false
7. Check to see if the file appear in the s3 processed good bucket: 
- SDPR Historical: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5429/theq_sdpr/historical/&showversions=false
- SDPR Daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5429/theq_sdpr/daily/&showversions=false
- SDPR Daily Incremental: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5429/theq_sdpr/daily-incremental/v01-Mar_2023_incremental/&showversions=false
- SDPR Hourly: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5429/theq_sdprabi/sdpr_hourly/v01-Mar_2023_incremental/&showversions=false