bcgov / GDX-Analytics-microservice

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.
Apache License 2.0
2 stars 0 forks source link

GDXDSD-6711 s3_to_redshift Python upgrade to v3.8 #220

Closed ozdemirozcelik closed 3 months ago

ozdemirozcelik commented 3 months ago

This PR does the following:

Testing intructions

For all scripts to be tested, connect to microservicessm ec2 node and ensure that you are on the testing branch **'GDXDSD-6711_python_3.8_s3_to_redshift_testbranch'** for all scripts. Testing branch has the necessary testing setup.

microservice_ssm cd /home/microservice/branch/GDXDSD-6711_python_3.8_s3_to_redshift/s3_to_redshift git switch GDXDSD-6711_python_3.8_s3_to_redshift_test_branch

s3_to_redshift.py

Run the script with Python v3.8 and confirm that the objects are ingested from S3 to Redshift

/home/microservice/.local/bin/pipenv run python3.8 s3_to_redshift.py config.d/GDXDSD-6711_test.microservice_iam_key_rotation.json

$ /home/microservice/.local/bin/pipenv run python3.8 s3_to_redshift.py config.d/GDXDSD-6711_test.microservice_iam_key_rotation.json
Report: s3_to_redshift.py

Config: config.d/GDXDSD-6711_test.microservice_iam_key_rotation.json

Microservice started at: 2024-07-19 16:12:41-0700 (PDT), ended at: 2024-07-19 16:13:01-0700 (PDT), elapsing: 0:00:19.800622.

Objects to process: 1
Objects successfully processed: 1
Objects that failed to process: 0
Objects output to 'processed/good': 1
Objects output to 'processed/bad': 0
Objects loaded to Redshift: 1
Empty Objects: 0

List of objects successfully fully ingested from S3, processed, loaded to S3 ('good'), and copied to Redshift:
1: client/test_microservice-iam-key/GDXDSD-6711-manual-v1.csv

Check the table value for the verification_phrase ' phrase_20240719 microservice_rs SELECT * FROM test.microservice_iam_key_rotation LIMIT 10; \q

$ microservice_rs 
SET
psql (9.2.24, server 8.0.2)
WARNING: psql version 9.2, server version 8.0.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

snowplow=> SELECT * FROM test.microservice_iam_key_rotation LIMIT 10;
 verification_phrase 
---------------------
 phrase_20240719
(1 row)
snowplow=> \q

cmslite_user_data_to_redshift.py

Create test tables

microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_groups.sql microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_access.sql microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_activity.sql microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_groups.sql

$ microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_groups.sql 
SET
CREATE TABLE
GRANT
$ microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_access.sql 
SET
CREATE TABLE
GRANT
$ microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_activity.sql 
SET
CREATE TABLE
GRANT
$ microservice_rs < ddl/GDXDSD-6711-test.gdxdsd_6711_user_groups.sql 
SET
CREATE TABLE
GRANT

Run the script with Python v3.8 and confirm that the objects are ingested from S3 to Redshift

/home/microservice/.local/bin/pipenv run python3.8 cmslite_user_data_to_redshift.py config.d/GDXDSD-6711_cmslite_user_group_metadata.json
Report cmslite_user_data_to_redshift.py
$ /home/microservice/.local/bin/pipenv run python3.8 cmslite_user_data_to_redshift.py config.d/GDXDSD-6711_cmslite_user_group_metadata.json
Report cmslite_user_data_to_redshift.py:

Config: config.d/GDXDSD-6711_cmslite_user_group_metadata.json

Microservice started at: 2024-07-23 09:20:57-0700 (PDT), ended at: 2024-07-23 09:22:51-0700 (PDT), elapsing: 0:01:54.406056.

Objects to process: 1
Objects successfully processed: 1
Objects that failed to process: 0
Objects output to 'processed/good': 1
Objects output to 'processed/bad': 0
Tables loaded to Redshift: 4/4

List of objects successfully fully ingested from S3, processed, loaded to S3 ('good'), and copied to Redshift:
client/oz_test/GDXDSD-6711/cmslite_gdx/cms-analytics-csv.2024-05-01.tgz

List of tables that were successfully loaded into Redshift:
test.gdxdsd_6711_cms_group
test.gdxdsd_6711_user_activity
test.gdxdsd_6711_user_group
test.gdxdsd_6711_user_status

asset_data_to_redshift.py

Create test tables

microservice_rs <ddl/GDXDSD-6711-test.asset_downloads.sql

$ microservice_rs <ddl/GDXDSD-6711-test.asset_downloads.sql 
SET
CREATE TABLE
GRANT

Run the script with Python v3.8 and confirm that the objects are ingested from S3 to Redshift

/home/microservice/.local/bin/pipenv run python3.8 asset_data_to_redshift.py config.d/GDXDSD-6711_intranet_assets.json

$ /home/microservice/.local/bin/pipenv run python3.8 asset_data_to_redshift.py config.d/GDXDSD-6711_intranet_assets.json
Report: asset_data_to_redshift.py

Config: config.d/GDXDSD-6711_intranet_assets.json

Microservice started at: 2024-07-23 10:38:48-0700 (PDT), ended at: 2024-07-23 10:39:25-0700 (PDT), elapsing: 0:00:36.687581.

Objects to process: 1
Objects successfully processed: 1
Objects that failed to process: 0
Objects output to 'processed/good': 1
Objects output to 'processed/bad': 0
Objects loaded to Redshift: 1
Empty Objects: 0

List of objects successfully fully ingested from S3, processed, loaded to S3 ('good'), and copied to Redshift:
1: client/oz_test/GDXDSD-6711/cmslite_gdx/intranet_apache.2020-01-05

build_derived_assets.py

Create test tables

microservice_rs < ddl/GDXDSD-6711-test.derived.asset_downloads_derived.sql

$ microservice_rs < ddl/GDXDSD-6711-test.derived.asset_downloads_derived.sql 
SET
CREATE TABLE
GRANT

Run the script with Python v3.8 and confirm that the objects are ingested from S3 to Redshift

/home/microservice/.local/bin/pipenv run python3.8 build_derived_assets.py config.d/GDXDSD-6711_intranet_assets.json

$ /home/microservice/.local/bin/pipenv run python3.8 build_derived_assets.py config.d/GDXDSD-6711_intranet_assets.json

Report build_derived_assets.py:

Config: config.d/GDXDSD-6711_intranet_assets.json

Microservice started at: 2024-07-23 11:19:05-0700 (PDT), ended at: 2024-07-23 11:19:32-0700 (PDT), elapsing: 0:00:27.133444.

Objects to process: 1
Objects that failed to process: 0
Objects loaded to Redshift: 1

List of tables successfully parsed, and copied to the derived table in Redshift:
test.gdxdsd_6711_asset_downloads

Cleanup

s3_to_redshift.py:

microservice_rs TRUNCATE TABLE test.microservice_iam_key_rotation;

$ microservice_rs 
SET
psql (9.2.24, server 8.0.2)
WARNING: psql version 9.2, server version 8.0.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

snowplow=> TRUNCATE TABLE test.microservice_iam_key_rotation;
TRUNCATE TABLE and COMMIT TRANSACTION
snowplow=> SELECT * FROM test.microservice_iam_key_rotation LIMIT 10;
 verification_phrase
---------------------
(0 rows)

cmslite_user_data_to_redshift.py

DROP TABLE IF EXISTS test.gdxdsd_6711_cms_group; DROP TABLE IF EXISTS test.gdxdsd_6711_user_activity; DROP TABLE IF EXISTS test.gdxdsd_6711_user_group; DROP TABLE IF EXISTS test.gdxdsd_6711_user_status;

snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_cms_group;
DROP TABLE
snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_user_activity;
DROP TABLE
snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_user_group;
DROP TABLE
snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_user_status;
DROP TABLE

asset_data_to_redshift.py & build_derived_assets.py

DROP TABLE IF EXISTS test.gdxdsd_6711_asset_downloads; DROP TABLE IF EXISTS test.gdxdsd_6711_asset_downloads_derived;

snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_asset_downloads;
DROP TABLE
snowplow=> DROP TABLE IF EXISTS test.gdxdsd_6711_asset_downloads_derived;
DROP TABLE