bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

Gdxdsd 5362 split sfts microservices into separate folders #161

Closed doughon closed 1 year ago

doughon commented 1 year ago

This PR does the following:

  1. Splits the two microservices currently in sfts/ into the folders redshift_to_s3/ and s3_to_sfts/ while making sure that the histories of the files are kept
  2. Modifies the configs for each microservice to remove unneeded parts that exist because the two microservices shared the config files
  3. Edited the README files to be specific about each microservice

To review the changes:

  1. The sfts/ folder has been renamed to redshift_to_s3
  2. The folder s3_to_sfts/ has been created
  3. s3_to_sfts.py has been moved to the s3_to_sfts/ folder
  4. The README, files related to the pipfiles, and config.d/ and its contents have been copied into s3_to_sfts
  5. The SDPR configs have been removed from s3_to_sfts as they are never uploaded there
  6. The configs in redshift_to_s3 have had options related to SFTS and their references in its README removed
  7. The configs in s3_to_sfts have had options related to dml, sql, dates and their references in its README removed

Testing these changes require modifications to the config files. You can see these changes by looking at the attached zip file to this ticket

Testing redshift_to_s3

  1. Review its README to see if it makes sense or if anything needs to be changed
  2. Log into the ec2 instance through the following commands
    awsmfa prod <AWS OTP>
    microservice_ssm
    cd /home/microservice/branch/GDXDSD-5362-split-sfts-microservices-into-separate-folders/redshift_to_s3
  3. Run the following commands and compare their output to what's expected. Note that the commands for pmrp_qdata_range and sdpr_historical each take minutes to run
    pipenv run python redshift_to_s3.py -c config.d/pmrp_all.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_date_range.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_max_date.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_daily.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_dates.json
    pipenv run python redshift_to_s3.py -c config.d/pmrp_qdata_range.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_historical.json
    pipenv run python redshift_to_s3.py -c config.d/sdpr_last_full_day.json
    
    ***The microservice ran successfully***

Report: redshift_to_s3.py

Config: config.d/pmrp_all.json

DML: pmrp_date_range.sql

Requested Dates: 20180929 to 20230128

Microservice started at: 2023-02-07 12:59:53-0800 (PST), ended at: 2023-02-07 13:00:05-0800 (PST), elapsing: 0:00:12.011171.

Objects to process: 1 Objects loaded to S3: 1/1 Objects successful loaded to S3: 1

List of objects successfully loaded to S3

  1. client/doug_test/GDXDSD-5362/pmrp_gdx/pmrp_all/pmrp_20180929_20230128_20230207T205953
  2. Check to see if the files appear in the s3 client bucket:

Testing s3_to_sfts

  1. Review its README to see if it makes sense or if anything needs to be changed
  2. Review the histories of the files in s3_to_sfts to see if their histories are maintained after they have been moved/copied
  3. Navigate using the following command
    cd /home/microservice/branch/GDXDSD-5362-split-sfts-microservices-into-separate-folders/s3_to_sfts
  4. Run the following commands and compare their output to what's expected. Note that the command for pmrp_qdata_range take minutes to run and that there are no commands for SDPR
    pipenv run python s3_to_sfts.py -c config.d/pmrp_all.json
    pipenv run python s3_to_sfts.py -c config.d/pmrp_date_range.json
    pipenv run python s3_to_sfts.py -c config.d/pmrp_max_date.json
    pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_daily.json
    pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_dates.json
    pipenv run python s3_to_sfts.py -c config.d/pmrp_qdata_range.json
    
    Report: s3_to_sfts.py

Config: config.d/pmrp_all.json

Microservice started at: 2023-02-07 13:20:20-0800 (PST), ended at: 2023-02-07 13:20:26-0800 (PST), elapsing: 0:00:06.422713.

Items to process: 1 Objects successfully processed to s3: 1 Objects unsuccessfully processed to s3: 0 Objects successfully processed to sfts: 1

Objects loaded to S3 /good:

1: processed/good/client/doug_test/GDXDSD-5362/pmrp_gdx/pmrp_all/pmrp_20180929_20230128_20230207T205953_part000


5. Check to see if the files appear in the s3 processed good bucket:
- pmrp_all: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_gdx/pmrp_all/&showversions=false
- pmrp_date_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_gdx/pmrp_date_range/&showversions=false
- pmrp_max_date: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_gdx/pmrp_max_date/&showversions=false
- pmrp_qdata_daily: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_qdata/daily/Jun_2022_change/&showversions=false
- pmrp_qdata_dates: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_qdata/dates/Jun_2022_change/&showversions=false
- pmrp_qdata_range: https://s3.console.aws.amazon.com/s3/buckets/sp-ca-bc-gov-131565110619-12-microservices?region=ca-central-1&prefix=processed/good/client/doug_test/GDXDSD-5362/pmrp_qdata/range/Jun_2022_change/&showversions=false
6. Check to see if the files appear in SFTS
- pmrp_all: https://filetransfer.gov.bc.ca/human.aspx?r=1052649598&arg06=933537552&arg12=filelist
- pmrp_date_range: https://filetransfer.gov.bc.ca/human.aspx?r=1052649598&arg06=933537552&arg12=filelist
- pmrp_max_date: https://filetransfer.gov.bc.ca/human.aspx?r=1052649598&arg06=933537552&arg12=filelist
- pmrp_qdata_daily: https://filetransfer.gov.bc.ca/human.aspx?r=1742300809&orgid=9585&rd=1
- pmrp_qdata_dates: https://filetransfer.gov.bc.ca/human.aspx?r=1742300809&orgid=9585&rd=1
- pmrp_qdata_range: https://filetransfer.gov.bc.ca/human.aspx?r=1742300809&orgid=9585&rd=1