NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

list of data quality checks for accurate count of units per record in PLUTO #957

Open sf-dcp opened 6 days ago

sf-dcp commented 6 days ago

Come up with a list of initial data checks to identify lots with potentially incorrect unit count.

sf-dcp commented 6 days ago

PTS data expectations:

How units are computed

Checks

The mentioned GH issue in the description outlines different kinds of issues. Here are some:

The challenge is these issues can be difficult to identify and they require research on exact patterns. Therefore, we will start with one check from the first bullet point: we will create a check, research failed cases for patterns, and develop a strategy what should be done with these cases.

Check description