Closed amenity closed 3 years ago
@sergiogcx - can you estimate this?
@amenity I do not understand the database design very well, not sure I am the best person to work on this.
I am happy to do it anyway, let's see how far I can get.
@sergiogcx - sorry, just seeing your comment now. I really meant that you could set it up in Google Sheets and specify the structure you'd like to see in the data dictionary. I envision everyone working on this collaboratively, as we build out features.
@sergiogcx — does ☝🏻 clarification help with this? I might be misunderstanding myself, but I think the idea was to create a template defining the type of data you'd like recorded for reference. No expectation you would be filling it in by yourself.
@amenity Ah, sorry about that. I have not yet spent any time on this, my mind has been focused elsewhere.
Let me begin this right away.
@amenity @mateoclarke This is the first draft that was generated semi-automatically following a similar pattern we had with VisionZero's documentation.
Each of those tabs was generated using this SQL query:
SELECT '-' AS humanized_name,
isc.column_name,
isc.udt_name AS data_type,
COALESCE(isc.column_default, '-') AS column_default,
COALESCE(pks.constraint_type, '-') AS constraint_type,
(CASE WHEN pks.foreign_table_name != isc.table_name THEN
CONCAT(pks.foreign_table_name, '.', pks.foreign_column_name)
ELSE '-' END) AS foreign_entity,
(CASE WHEN isc.is_nullable = 'YES' THEN 'Yes' Else 'No' END) AS is_nullable,
(CASE WHEN isc.character_maximum_length IS NULL THEN '-' ELSE concat('', character_maximum_length) END) AS char_max_length,
(CASE WHEN isc.numeric_precision IS NULL THEN '-' ELSE concat('', numeric_precision) END) AS numeric_precision,
'-' AS description
FROM information_schema.columns AS isc
LEFT JOIN (
SELECT
tc.table_name,
c.column_name,
tc.constraint_type,
ccu.table_schema AS foreign_table_schema,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage AS ccu USING (constraint_schema, constraint_name)
JOIN information_schema.columns AS c ON c.table_schema = tc.constraint_schema
AND tc.table_name = c.table_name AND ccu.column_name = c.column_name
) AS pks ON pks.table_name = isc.table_name AND isc.column_name = pks.column_name
WHERE isc.table_name = 'moped_table_name';
@amenity It may be possible to generate markdown files automatically every time we make a change to the database. This is possible if we make comments for each table and each column. We would also need to write a python script, it should not be difficult at all.
Let me know if this is something you would be interested for me to do.
This is how I got the list of tables (index):
SELECT row_number() over (ORDER BY t.table_name),
t.table_name,
pg_catalog.obj_description(pgc.oid, 'pg_class')
FROM information_schema.tables AS t
INNER JOIN pg_catalog.pg_class pgc
ON t.table_name = pgc.relname
WHERE table_schema = 'public'
AND table_name LIKE 'moped_%'
ORDER BY table_name;
Outputs this:
row_number table_name obj_description
1 moped_categories Standardized categories for projects
2 moped_city_fiscal_years Standardized fiscal years maintained by city
3 moped_components Project facilities or components
4 moped_entity Standardized list of project-related entities, including workgroups, COA partners, and sponsors
5 moped_fund_opp Standardized list of identified funding opportunities
6 moped_fund_source_cat Standardized funding source categories
7 moped_fund_sources Standardized funding sources for projects
8 moped_group Standardized list of COA groupings or initiatives
9 moped_milestones Standardized list of project milestones
10 moped_phases Standardized list of project phases
11 moped_proj_categories List of related project categories
12 moped_proj_communication Latest Project Communication
13 moped_proj_components Moped Project Components Table -- formerly facilities in Interim Project Database
14 moped_proj_dates List of associated project dates, likely from important project phases or milestones
15 moped_proj_entities List of entities related to a project, including internal and external groups
16 moped_proj_financials Financial data related to a project -- may be sourced from Moped or Controller's Office or eCapris
17 moped_proj_fiscal_years Moped Project Fiscal Years
18 moped_proj_fund_opp Identified funding opportunities for a given project
19 moped_proj_fund_source Main source for a project's funding
20 moped_proj_groups moped_project_groups
21 moped_proj_location moped_project_location
22 moped_proj_milestones
23 moped_proj_notes moped_project_notes
24 moped_proj_partners moped_project_partners
25 moped_proj_personnel Project Team members
26 moped_proj_phases
27 moped_proj_sponsors All sponsors of a moped project regardless of city involvement
28 moped_proj_status_history List of status changes throughout a given project, including status changes brought about by milestones or phases
29 moped_proj_status_notes List of notes tied to status changes in a given project
30 moped_proj_timeline Most recent progress regarding a given project's timeline (1:1 relationship with project)
31 moped_project Moped Project Table -- Parent entity of most in the database, primary key is foreign key for many entities
32 moped_project_roles Project roles for project staffing selection
33 moped_status Standardized list of project statuses
34 moped_users Standardized list of city of Austin employees
35 moped_workgroup
Based on our conversation today, seems like we're all on the same page that SchemaSpy tool is going to go above and beyond in solving our needs for the data dictionary. Should we close out this issue and open backlog task(s) to implement some automated update/CI features for SchemaSpy? cc/ @sergiogcx
Start with a Google Sheet, add the tables we're already using.
Ideally this will be something we can share with users along with other user documentation in Gitbook/Confluence/ etc.