ccao-data / wiki

Handbook, how-tos, and other documentation
2 stars 1 forks source link

Add overall architecture diagram for the Data Dept. #2

Closed dfsnow closed 10 months ago

dfsnow commented 1 year ago

Create an architecture diagram that shows the general structure of the department's data architecture. It should give newcomers an idea of how data flows for the processes the Data Department is responsible for.

jeancochrane commented 10 months ago

@dfsnow I think that I'll store this diagram in the overview for the dbt data catalog once it's done. For now, how does this draft look? Anything I'm missing?

---
title: CCAO data flow diagram
---
flowchart TD
    A[Mainframe] & B[User input] --> C[iasWorld system of record]
    C -- service-sqoop-iasworld --> D(((AWS Athena warehouse)))
    E["Public data sources (e.g. Census, OSM, BetterSchools)"] & F["Private data sources (e.g. RPIE, sales)"] -- R extraction scripts --> D
    D -- R transformation scripts --> D
    D --> I[dbt] --> D
    D --> J[CTAs] --> D
    D ----> K[AWS Glue jobs]
    K ---> L(Ratio stats) -- reporting database --> D
    K ---> M(Res reporting) -- reporting database --> D
    K ---> N(Sales flagging) -- sales database --> D
    D --> O[On-prem modelling and development server] -- Socrata agent --> P[Open data portal]
dfsnow commented 10 months ago

@jeancochrane Extremely solid. Mainly only missing Tableau and modeling. See my changes below.

---
title: CCAO Data Flow Diagram
---
flowchart TD
    A[Mainframe + AS/400] & B[User input] --> C[(iasWorld)]
    C -- service-sqoop-iasworld --> D[(AWS Athena\nwarehouse)]
    E["Public data sources\n(e.g. Census, OSM, GTFS)"] & F["Private data sources\n(e.g. RPIE, sales)"] -- R extraction scripts --> D
    D -- R transformation scripts --> D
    D --> I[dbt] --> D
    D --> J[CTAs] --> D
    D ----> K[AWS Glue jobs]
    K ---> L(Ratio stats) -- reporting database --> D
    K ---> M(Res reporting) -- reporting database --> D
    K ---> N(Sales flagging) -- sale database --> D
    D --> O[On-prem modeling\nand dev. server] -- Socrata agent --> P[Open data portal]
    O -- R modeling pipeline --> D
    L & M --> Q[Tableau reports]
    D -- Scheduled extracts --> Q