NYCPlanning / design

A repository for the Design team in the Geographic Data and Engineering team of the NYC Department of City Planning.
1 stars 0 forks source link

assist with DE data documentation #37

Closed damonmcc closed 1 month ago

damonmcc commented 3 months ago

DE issue: https://github.com/NYCPlanning/data-engineering/issues/944

DE would like automate the generation of data documentation.

current state

The Data Engineering (DE) and GIS teams make and distribute data. In most cases, documentation is made and distributed along with the data and must change to reflect changes in the data.

That documentation is currently manually created and modified. The current forms of documentation which distributed with data are:

The Data Catalog in the DE wiki has links to the destinations these data products and their documentation are distributed to.

desired state

DE would like to minimize or eliminate the manual modifying of data documentation. In, general, DE would like to:

  1. create templates of data documentation
  2. use the new data product metadata to populate templates

This is an opportunity to standardize and improve the design of data documentation.

Some documentation has strict formatting rules (e.g. data dictionaries in the Attachments of on Open Data pages). But we can determine the form and content of all other documentation.

jessicashanshanhuang commented 3 months ago

initial questions/thoughts:

intro to metadata

damonmcc commented 3 months ago

what md flavors are currently being used? i'm assuming default is GFM (github flavor).

we currently use GFM for all our markdown files, but that's just because we only make markdown files to be viewed in github lol

jessicashanshanhuang commented 3 months ago

Follow up from Matt:

I met with Amanda yesterday and she confirmed design fellow should focus on design of a new DCP Readme file. Data Engineering and engineering fellow should focus on developing code to distribute metadata from YAML files to the various metadata end points; DCP Readme, OTI required Data Dictionary, ESRI metadata.

I’m following up with templates and examples of the different metadata, Readme and Data Dictionaries used here at DCP.

Templates (see this SharePoint ​Folder icon folder) – if fellows do not have access please reach out and I’ll send copies directly.

Open Data Dictionary (Individual dataset) Open Data Dictionary (Collection) DCP Readme ArcGIS metadata

Examples (standard):

ArcGIS Metadata (print out) – Zoning Districts DCP Readme - COLP Open Data Dictionary (Individual) – Housing Database: Project Level Files (downloads .xlsx file) Open Data Dictionary (Collection) – Capital Projects Database (downloads .xlsx file)

Examples (ad hoc / non-standard)

PLUTO Data Dictionary PLUTO Readme PLUTO Change File Facilities Readme Facilities Data Dictionary – added tabs (downloads .xlsx file) Zoning Tax Lot Database Data Dictionary/Readme

You can explore all our data and metadata offerings on the BYTES of the BIG APPLE site.

jessicashanshanhuang commented 3 months ago

@damonmcc do we have a specific timeline of when we should get the initial research phase done by?

jessicashanshanhuang commented 3 months ago

Observations:

Questions:

jessicashanshanhuang commented 3 months ago

check-in 07/24 notes:

jessicashanshanhuang commented 3 months ago

broke down the two major deliverables we will be working on into these issues:

jessicashanshanhuang commented 3 months ago

Metadata PDF Heng produced from PLUTO

Questions/Observations:

Ideas:

hey @damonmcc, @wndyli here are my thoughts/questions for the check-in, if you could relay those that would be great!

wndyli commented 3 months ago

going off of jessica's comment regarding the pluto metadata pdf:

questions/observations:

ideas: