Closed sumiat closed 4 months ago
Excited for this ADR! I've listed some additional options and decision criteria I would recommend incorporating into this ADR:
Thanks @widal001! I added your comment to the body of the issue.
@widal001 noting I pulled the orchestration tools out of this, as the transformation tool and orchestration tool can be decoupled. I like your list and would include them in an upcoming orchestration tool ADR!
Aaron mentioned https://hub.meltano.com/singer/spec/ as well. I'm not sure if that would be part of the transformation layer or another layer, but figured I'd mention it here for everyone's situational awareness.
I'm combining this with https://github.com/HHS/simpler-grants-gov/issues/1278 and will have a single ADR update that captures our loading from Oracle, orchestration with step functions, and transformation with python.
Description
Once the https://github.com/HHS/simpler-grants-gov/issues/54 is completed, we will have a table being replicated from Grants.gov's Oracle database to Simpler's Postgres database in order to store and share opportunity data for the API.
In https://github.com/HHS/simpler-grants-gov/issues/990 and in the next stages of the API development, many more tables will be introduced. We will want a process for transforming the current Grants.gov data model into a new data model. The transformation steps will likely include small changes like column types, but also larger changes like table names, concatenating and merging fields, or other more complex transformations.
It is worth noting that alembic owns the schema changes and updates through migrations.
This task is to create an ADR to test the tools and workflow for running those data transformations.
NOTE: This does not encompass the data orchestration tooling such as Airflow/Dagster NOTE: We are using the term ELT vs ETL because DMS is handling the Extract and Load steps.
Approvers
@lucasmbrown-usds Julius Narmada @widal001
Options
Decision Criteria
Review the decision criteria with key stakeholders (Lucas, Narmada, Julius, Billy) before starting this ADR
Definition of Done
documentation/decisions/adr
with status "Accepted"