consensusnetworks / casimir

🌊 Decentralized staking and asset management
https://casimir.co
Other
6 stars 3 forks source link

Update CDK stack for ETL to handle redeploy #353

Closed shanejearley closed 1 year ago

shanejearley commented 1 year ago

New schemas require new buckets, but tables should be updateable. Should be a simple fix to the table definition.

shanejearley commented 1 year ago

@hawyar you mentioned you had read some notes around Glue versioning. We can add some simple strategy to our analytics deployment. Also feel free to rename all ETL names to analytics so it's more accurate.

hawyar commented 1 year ago

Regarding schema versioning, first we want to create a schema registry casimir_schema_regsitry managed by Glue. Then we create 3 schemas event, wallet, and staking_action as we have it in JSON Schema. When creating those schema in Glue we have to also choose a backward compatibility mode to dictate what happens when we delete/add/update fields or their types. I have chosen "no compatibility" which gives us flexibility for now but please advise here. Then CDK would pick up the schema from the registry to create or update table.

shanejearley commented 1 year ago

Passing major version number of @casimir/data to bucket and table naming. Schema changes require a major version bump (else deploy will clash and fail). Maybe we can add an auto check for this beforehand; for now we can check in review. Note, forgot to add bucket name to output bucket, but we may remove this from CDK anyways. Thank you @hawyar