HHS / simpler-grants-gov

https://simpler.grants.gov
Other
29 stars 9 forks source link

[ADR]: Dashboard ETL orchestration strategy #1248

Open widal001 opened 4 months ago

widal001 commented 4 months ago

Description

In order to populate the delivery dashboard with metrics calculated based on data pulled from GitHub, we need a strategy to run the analytics pipeline created in the analytics/ sub-directory and post the results somewhere that the dashboard UI can access them.

The goal of this ADR is to document our current approach to extracting and transforming the GitHub data into the dashboard metrics, and to evaluate and recommend an orchestration tool to run this analytics pipeline.

Approvers

Options

Note: Since we already have the analytics pipeline to extract and transform the data needed to calculate the metrics, the main options we're considering are for the orchestration layer (i.e. how we're triggering this analytics pipeline)

Decision Criteria

Definition of Done

bretthrosenblatt commented 2 months ago

Maybe consider AppFlow --> Glue? AppFlow has a GA4 connector, drops extracts into S3, Glue (or lambda/python) adds to Aurora staging tables.

coilysiren commented 1 month ago

I just copied the existing step functions setup for this. For the sprint reports ETL, at least.

Here => https://github.com/HHS/simpler-grants-gov/blob/e801e00ada24fa51d3bdc7b3a9ffb05114968a55/infra/analytics/service/sfn_sprint_reports.tf