cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Create a Moped data dictionary #4744

Closed amenity closed 3 years ago

amenity commented 3 years ago

Start with a Google Sheet, add the tables we're already using.

Ideally this will be something we can share with users along with other user documentation in Gitbook/Confluence/ etc.

amenity commented 3 years ago

@sergiogcx - can you estimate this?

sergiogcx commented 3 years ago

@amenity I do not understand the database design very well, not sure I am the best person to work on this.

I am happy to do it anyway, let's see how far I can get.

amenity commented 3 years ago

@sergiogcx - sorry, just seeing your comment now. I really meant that you could set it up in Google Sheets and specify the structure you'd like to see in the data dictionary. I envision everyone working on this collaboratively, as we build out features.

amenity commented 3 years ago

@sergiogcx — does ☝🏻 clarification help with this? I might be misunderstanding myself, but I think the idea was to create a template defining the type of data you'd like recorded for reference. No expectation you would be filling it in by yourself.

sergiogcx commented 3 years ago

@amenity Ah, sorry about that. I have not yet spent any time on this, my mind has been focused elsewhere.

Let me begin this right away.

sergiogcx commented 3 years ago

@amenity @mateoclarke This is the first draft that was generated semi-automatically following a similar pattern we had with VisionZero's documentation.

https://docs.google.com/spreadsheets/d/19f1S18qRkYkfMp3JOBFTku0zX_5levOynQ6WALNNc0o/edit#gid=1061236014

Each of those tabs was generated using this SQL query:

SELECT '-' AS humanized_name,
       isc.column_name,
       isc.udt_name AS data_type,
       COALESCE(isc.column_default, '-') AS column_default,
       COALESCE(pks.constraint_type, '-') AS constraint_type,
       (CASE WHEN pks.foreign_table_name != isc.table_name THEN
           CONCAT(pks.foreign_table_name, '.', pks.foreign_column_name)
           ELSE '-' END) AS foreign_entity,
       (CASE WHEN isc.is_nullable = 'YES' THEN 'Yes' Else 'No' END) AS is_nullable,
       (CASE WHEN isc.character_maximum_length IS NULL THEN '-' ELSE concat('', character_maximum_length) END) AS char_max_length,
       (CASE WHEN isc.numeric_precision IS NULL THEN '-' ELSE concat('', numeric_precision) END) AS numeric_precision,
       '-' AS description

FROM information_schema.columns AS isc

    LEFT JOIN (
        SELECT
                tc.table_name,
                c.column_name,
                tc.constraint_type,
                ccu.table_schema AS foreign_table_schema,
                ccu.table_name AS foreign_table_name,
                ccu.column_name AS foreign_column_name
        FROM information_schema.table_constraints tc
        JOIN information_schema.constraint_column_usage AS ccu USING (constraint_schema, constraint_name)
        JOIN information_schema.columns AS c ON c.table_schema = tc.constraint_schema
          AND tc.table_name = c.table_name AND ccu.column_name = c.column_name
    ) AS pks ON pks.table_name = isc.table_name AND isc.column_name = pks.column_name

WHERE isc.table_name = 'moped_table_name';
sergiogcx commented 3 years ago

@amenity It may be possible to generate markdown files automatically every time we make a change to the database. This is possible if we make comments for each table and each column. We would also need to write a python script, it should not be difficult at all.

Let me know if this is something you would be interested for me to do.

sergiogcx commented 3 years ago

This is how I got the list of tables (index):

SELECT row_number() over (ORDER BY t.table_name),
       t.table_name,
       pg_catalog.obj_description(pgc.oid, 'pg_class')

FROM information_schema.tables AS t
INNER JOIN pg_catalog.pg_class pgc
    ON t.table_name = pgc.relname
WHERE table_schema = 'public'
    AND table_name LIKE 'moped_%'
ORDER BY table_name;

Outputs this:

row_number  table_name  obj_description
1   moped_categories    Standardized categories for projects
2   moped_city_fiscal_years Standardized fiscal years maintained by city
3   moped_components    Project facilities or components
4   moped_entity    Standardized list of project-related entities, including workgroups, COA partners, and sponsors
5   moped_fund_opp  Standardized list of identified funding opportunities
6   moped_fund_source_cat   Standardized funding source categories
7   moped_fund_sources  Standardized funding sources for projects
8   moped_group Standardized list of COA groupings or initiatives
9   moped_milestones    Standardized list of project milestones
10  moped_phases    Standardized list of project phases
11  moped_proj_categories   List of related project categories
12  moped_proj_communication    Latest Project Communication
13  moped_proj_components   Moped Project Components Table -- formerly facilities in Interim Project Database
14  moped_proj_dates    List of associated project dates, likely from important project phases or milestones
15  moped_proj_entities List of entities related to a project, including internal and external groups
16  moped_proj_financials   Financial data related to a project -- may be sourced from Moped or Controller's Office or eCapris
17  moped_proj_fiscal_years Moped Project Fiscal Years
18  moped_proj_fund_opp Identified funding opportunities for a given project
19  moped_proj_fund_source  Main source for a project's funding
20  moped_proj_groups   moped_project_groups
21  moped_proj_location moped_project_location
22  moped_proj_milestones   
23  moped_proj_notes    moped_project_notes
24  moped_proj_partners moped_project_partners
25  moped_proj_personnel    Project Team members
26  moped_proj_phases   
27  moped_proj_sponsors All sponsors of a moped project regardless of city involvement
28  moped_proj_status_history   List of status changes throughout a given project, including status changes brought about by milestones or phases
29  moped_proj_status_notes List of notes tied to status changes in a given project
30  moped_proj_timeline Most recent progress regarding a given project's timeline (1:1 relationship with project)
31  moped_project   Moped Project Table -- Parent entity of most in the database, primary key is foreign key for many entities
32  moped_project_roles Project roles for project staffing selection
33  moped_status    Standardized list of project statuses
34  moped_users Standardized list of city of Austin employees
35  moped_workgroup 
mateoclarke commented 3 years ago

Based on our conversation today, seems like we're all on the same page that SchemaSpy tool is going to go above and beyond in solving our needs for the data dictionary. Should we close out this issue and open backlog task(s) to implement some automated update/CI features for SchemaSpy? cc/ @sergiogcx