HHS / simpler-grants-gov

https://simpler.grants.gov
Other
39 stars 9 forks source link

[ADR]: Technology for Data Transformation Steps (ELT) #993

Closed sumiat closed 4 months ago

sumiat commented 8 months ago

Description

Once the https://github.com/HHS/simpler-grants-gov/issues/54 is completed, we will have a table being replicated from Grants.gov's Oracle database to Simpler's Postgres database in order to store and share opportunity data for the API.

In https://github.com/HHS/simpler-grants-gov/issues/990 and in the next stages of the API development, many more tables will be introduced. We will want a process for transforming the current Grants.gov data model into a new data model. The transformation steps will likely include small changes like column types, but also larger changes like table names, concatenating and merging fields, or other more complex transformations.

It is worth noting that alembic owns the schema changes and updates through migrations.

This task is to create an ADR to test the tools and workflow for running those data transformations.

NOTE: This does not encompass the data orchestration tooling such as Airflow/Dagster NOTE: We are using the term ELT vs ETL because DMS is handling the Extract and Load steps.

Approvers

@lucasmbrown-usds Julius Narmada @widal001

Options

Decision Criteria

Review the decision criteria with key stakeholders (Lucas, Narmada, Julius, Billy) before starting this ADR

Definition of Done

widal001 commented 8 months ago

Excited for this ADR! I've listed some additional options and decision criteria I would recommend incorporating into this ADR:

Additional Options

Decision drivers

sumiat commented 8 months ago

Thanks @widal001! I added your comment to the body of the issue.

acouch commented 7 months ago

@widal001 noting I pulled the orchestration tools out of this, as the transformation tool and orchestration tool can be decoupled. I like your list and would include them in an upcoming orchestration tool ADR!

lucasmbrown-usds commented 7 months ago

Aaron mentioned https://hub.meltano.com/singer/spec/ as well. I'm not sure if that would be part of the transformation layer or another layer, but figured I'd mention it here for everyone's situational awareness.

acouch commented 4 months ago

I'm combining this with https://github.com/HHS/simpler-grants-gov/issues/1278 and will have a single ADR update that captures our loading from Oracle, orchestration with step functions, and transformation with python.