NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

ZTL QA issue: high qc_bbldiff.shp record count #906

Closed jackrosacker closed 1 week ago

jackrosacker commented 2 weeks ago

I haven't covered every avenue on our side yet, but flagging @fvankrieken early so we can look in parallel, or if you know of a change that could be responsible.

Issue

qc_bbldiff shapefile contains 6,000+ lots (typically contains ~200 lots). A quick examination shows that:

Next Steps

fvankrieken commented 2 weeks ago

I'll start comparing dtm between last months and this months builds

fvankrieken commented 1 week ago

This was caused by the way ztl does QA. QA tables are created in a persisted db where every ztl is archived at the end of each build. Last month, after ztl was published, a new build was run using the mid-cycle dtm that I had to reproject. Not sure if something went wrong ingesting that dtm, but that build (from nightly qa) was archived in the ztl qa db. When a new build (with a fixed dtm) was run this month, it wasn't compared to last months published ztl but rather this archived nightly_qa build that was never published.

I reran a job recreating last month's build (pinning source versions) to replace last month's archive, then reran this month's build with new data. qc_bbldiffs file looks as expected now