NYCPlanning / db-zoningtaxlots

Zoning Tax Lot Database (ZTLDB) -- associates zoning designations with lots from the Department of Finance Digital Tax Map
https://edm-data-engineering.nycplanningdigital.com/?page=Zoning+Tax+Lots
0 stars 0 forks source link

qc_bbldiffs file fails to include two BBLs where ZMA/ZDs have changed/been created #130

Closed caseysmithpgh closed 1 year ago

caseysmithpgh commented 1 year ago

Two BBLs that should be in Feb. open data qc_bbldiffs file were not included.

Findings from our initial investigation:

Below image shows new Feb. Zoning Map Amendment (blue highlighted selection), lots intersecting with this ZMA should be included in qc_bbldiff layer, but are not. Happy to follow up with screen share or in-person walk though of the issue. image

AmandaDoyle commented 1 year ago

@caseysmithpgh I don't see either of the BBLs you listed in the DTM that's on Digital Ocean. Based on this that's my rationalization as to why they are not in the reports. I'm happy to hop on a call to talk through and visualize you're process.

caseysmithpgh commented 1 year ago

@AmandaDoyle interesting. Ok, I will take a look through the previous DTM and current DTM (and those that fell in between) to try to isolate the issue.

jackrosacker commented 1 year ago

Hey Amanda, we just ran some double checks and are seeing the two BBLs in the latest DTM. We've checked most of the things we can think of on this end, and next steps would probably be to see what DE can dig up or to sit down together and figure out where we're going wrong.

This image is from ArcGIS, shows the data source in Cyberduck, the file pulled up in the map view, and the lots highlighted. Trying to be explicit with a screenshot in case we're separately looking at different datasets or something. image

Also +@croswell81

AmandaDoyle commented 1 year ago

@jackrosacker @caseysmithpgh I was looking at the dof_dtm dataset in edm-recipies here. I'm happy to meet to talk this through (unfortunately today is not a good day with meetings). If I'm not free please feel free to troubleshoot with someone in DE

AmandaDoyle commented 1 year ago

@damonmcc for awareness and he'll be point of contact

damonmcc commented 1 year ago

gonna make sure the dof_dtm data used by the build action (in edm-recipes) is identical to the source of source data (in edm-publishing)

if it isn't, I'll run our archiving and start a ZTLB build to see if that was the root cause 🤞🏾

damonmcc commented 1 year ago

seeing significant differences between input data, specifically dof_dtm

from asking Max, the # of rows shouldn't change so much

damonmcc commented 1 year ago

@jackrosacker

the screenshot of the recent DOT DTM data you all have been uploading monthly to edm-publishing:

image
jackrosacker commented 1 year ago
@damonmcc I'm looking at the datasets pre-Digital Ocean upload and I'm seeing the following counts: date row count column count
2022-12-30 858,328 33
2023-01-27 858,318
2023-02-03 858,316 33
2023-03-03 858,279 33

Edit: fixed to show 1/27 instead of 2/3

(for GIS internal reference, all three above are from the scrape output locations on M:\DOF_Tax_Maps) image

jackrosacker commented 1 year ago

@damonmcc I also confirmed the row counts for the corresponding shapefiles in edm-publishing for the three dates, all counts are the same as for the feature classes in the table above

damonmcc commented 1 year ago

I'm currently trying to convert the shapefile to a sql file locally and using the least amount of data-library code as possible to minimize the likelihood of losing rows and quickly upload the data needed to build ZTLB

so far, 20220429 appears to be the last source dataset in edm-publishing. 20220603 is the very next dataset and has the "half file" issue we're seeing

damonmcc commented 1 year ago

use of QGIS to reproject and convert a shapefile seemed feasible, but a build with the result failed because of how QGIS structures the sql file (every line is a long INSERT statement rather than our usual lines of text)

reproject with command line tools (ogr2ogr and shp2pgsql) is in progress and hopefully produces a sql file the ZTLB build process can use

update: seems promising!

damonmcc commented 1 year ago

very promising. here's the current approach

most recently, the build failed because some geometries are invalid. fixing by reprojecting with --makevalid flag

but got this error

ERROR 1: Attempt to write non-polygon (LINESTRING) geometry to POLYGON type shapefile.
ERROR 1: Unable to write feature 394957 from layer dof_dtm_tax_lot_polygon.
damonmcc commented 1 year ago

@jackrosacker @caseysmithpgh a successful build!

with new exports in edm-publishing/db-zoningtaxlots/latest/, would love to have extra eyes on inspecting the results. perhaps worth clarifying February vs March releases since I'm not sure which one we're on and whether the input data for this build was ideal

caseysmithpgh commented 1 year ago

@damonmcc @jackrosacker I took a look at the edm-publishing/db-zoningtaxlots/latest/qc_bbldiffs.csv and the record count is significantly higher, and more in line with what we would typically expect. The missing bbls that originally tipped us off to the issue are also included--so this output on that front looks good to me!

Many thanks Damon!

caseysmithpgh commented 1 year ago

@damonmcc confirming that I'm clear to QA items that are in edm-publishing/db-zoningtaxlots/latest/

damonmcc commented 1 year ago

@caseysmithpgh yup!

and looks like this is meant to be for the February release so, in case it's helpful, here's a link to the source data versions in the build logs

damonmcc commented 1 year ago

per Data Update issue https://github.com/NYCPlanning/edm-overview/issues/866, ZTLDB has been QAed and pushed to Bytes. closing this issue as complete