NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
19 stars 0 forks source link

Pluto 24v3 #976

Open sf-dcp opened 1 month ago

sf-dcp commented 1 month ago

Main tasks

Data loading

Manual Updates

Updated 2x a year typically in June and December

Automated Updates

Open data automated pull

Check here to see the latest run

DOF Automated Pull and Number of Buildings

Updated with Quarterly updates (check here)

Updated with Zoning Taxlots

(check here for latest run).

These are all produced by GIS, who typically update them sometime in the first week of each month. Check in with them before archiving with data library

Never Updated (Safe to ignore)

damonmcc commented 1 month ago

waiting for PR https://github.com/NYCPlanning/data-engineering/pull/977 for a PLUTO zoning fix and for COLP to pass QA

sf-dcp commented 4 weeks ago

Zoning fix has been merged. COLP is still in QA. We decided to proceed with PLUTO build based on the previous COLP version (Dec '23) to avoid release delays

sf-dcp commented 4 weeks ago

Preliminary DE QA review based on the QAQC page ("last of same version type" comparison type):

Build: sf-pluto-24v3

Aggregate Changes

Expected Value Comparison

Outlier Analysis

New and Vanished BBLs

fvankrieken commented 4 weeks ago

Couple random comments

So seems like two big things to flag for gis

fvankrieken commented 4 weeks ago

The sanitation districts is odd - looks like this only comes from dsny_frequencies? Which really hasn't changed in the last year

sf-dcp commented 3 weeks ago

With GIS for QA review.

caseysmithpgh commented 2 weeks ago

QA'ed PLUTO today. Everything seems to be in order, just going to have Jack take a look at it tomorrow for final conformation.

Two quick flags:

FYI @NYCPlanning/data-engineering @jackrosacker @croswell81

fvankrieken commented 2 weeks ago

Copying my note on zd3/4 from last time - only 215 records have zd3, only 13 have zd4.

And we actually have a single zd3 change this time, which works out to a higher percentage of zd3 lots that had zd3 change than normal lots had zd1 change. So I don't think that's something we need to worry about.

caseysmithpgh commented 2 weeks ago

Sweet thanks. I'll note this in our QA doc so it doesn't come up again next time.

caseysmithpgh commented 2 weeks ago

FYI @sf-dcp

GIS is signing off--with the caveat that DE should verify the lot area (mistakenly labeled floor area, according to finn) increase of BBL 3000160017. Barring any concern on DE's end, ready for promotion and subsequent publication.

sf-dcp commented 2 weeks ago

Hi @caseysmithpgh , @fvankrieken and I checked the lot and the result is... interesting.

Unclipped: image

Is the lot area in PLUTO supposed to include land area only or its total area?

caseysmithpgh commented 2 weeks ago

Hmm--interesting case. I don't know the answer, happy to take a look at the data dictionary to see if there is something in there about it.

caseysmithpgh commented 2 weeks ago

The Data dictionary says the following:

Total area of the tax lot, expressed in square feet rounded to the nearest integer. LOT AREA contains street beds when the tax lot contains “paper streets” i.e., street mapped but not built. If the tax lot is not an irregularly shaped lot (see IRREGULAR LOT CODE) the Department of Finance calculates the LOT AREA by multiplying the LOT FRONTAGE by the LOT DEPTH. If the tax lot is irregularly shaped, DOF calculates the LOT AREA from the Digital Tax Map. If PTS contains a zero value for LOT AREA, this field is changed to show the area of the tax lot’s geometric shape in the Digital Tax Map and DCPEdited is set to “1”.

Nothing explicitly stated about land vs non-land area, but I imagine since that's the case the entirety of the lot is included in the LotArea

sf-dcp commented 2 weeks ago

Hi @caseysmithpgh, thank you for checking the data dictionary. Since the area aligns with the unclipped lot size and there is nothing in data dictionary indicating land-only area, we are moving forward with publishing PLUTO.

caseysmithpgh commented 2 weeks ago

@sf-dcp Great, thanks! Just let me know once it's been promoted and I will get started with our distribution process.

sf-dcp commented 2 weeks ago

@sf-dcp Great, thanks! Just let me know once it's been promoted and I will get started with our distribution process.

Done!

jackrosacker commented 1 week ago
Hello @NYCPlanning/data-engineering! Found some weirdness while prepping QA data for this version of PLUTO today. 9 tax lots are showing a change in the BCT2020 field (census tract), when comparing 24v2 and 24v3. We're going to hold off on publishing until we get a chance to chat with Matt on Monday, and wanted to note the lots to you all as well. BBL 24v2_BCT2020 24v3_BCT2020
1000160003 1031704 1031703
1012540001 1018300 1017500
2023760048 2006700 2006900
3024140001 3055100 3055500
3067260087 3053200 3076800
4080510001 4003700 4148300
4142600001 4071600 4066401
5004870100 5000600 5002100
5035630042 5029106 5011402

These lots fall into three categories: (1) long linear lots that span multiple census tracts, (2) lots that just brush the edge of a census tract, and (3) small, regular lots that are fully contained within a census tract but have still changed in value.

My query that found these variable values selected BCT2020 arbitrarily, and it's possible that similar changes exist for other fields as well.

Also note that there are some odd lot boundary changes in the vicinity of 1000160003 (Battery Park City), but these exist in the DTM version we pulled from DOF, and we are reaching out to them separately. Worth looking at though, as none of our existing QA procedures would necessarily catch tax lot overlap errors like these ones, since the zoning isn't changing.

fvankrieken commented 1 week ago

Hmm. These tracts come from geocoding PTS, not via spatial joins, so it's odd that we'd see inconsistent behavior. So it would seem that this is due to either

  1. input to geocoding call changing (PTS address info for these lots)
  2. difference in geocoding (24B vs 24C1 or something like that)
  3. these are lots where we have multiple geocoded PTS rows and we don't aggregate in a consistent/deterministic manner
sf-dcp commented 1 week ago

@jackrosacker @caseysmithpgh

Looking into the issue more, we get census tract field either through geocoding BBLs with Geosupport or via spatial join with census tract dataset. And these BBLs identified by Jack have one of the two situations:

We will be meeting with Amanda to research the issue and see if/how we can solve the changing census tract values long-term. We can move forward with publishing the current PLUTO version.

jackrosacker commented 1 week ago

We can move forward with publishing the current PLUTO version.

@sf-dcp thanks for the summary and sounds good. We also had a chance to check in with @croswell81 re the tax lot errors that we found in the Battery Park City area, and it looks like we will need to re-build PLUTO 24v3 to incorporate the DTM that DOF fixed at end of day last Friday.

We're in the process of republishing today's DTM to Digital Ocean and will update when the build is ready to go.

Worth discussing at some point how to add some topology rules to our QA process to catch things like significant tax lot overlaps - might be a good topic for when we sit down for the next PLUTO QA together.

caseysmithpgh commented 1 week ago

@sf-dcp updated DTM has been published to staging

FYI @croswell81 @jackrosacker

sf-dcp commented 1 week ago

Sounds good! Will let you know when it's ready for the 2nd round of QA

sf-dcp commented 1 week ago

@caseysmithpgh @jackrosacker @croswell81

Not creating a GH issue for QA since we've started this data update before the new process.

PLUTO 24v3 has been re-built and promoted to draft folder under draft/24v3/2-update-dtm-and-correct-units/.

Notes:

croswell81 commented 1 week ago

[like] Matthew Croswell (DCP) reacted to your message:


From: sf-dcp @.> Sent: Wednesday, August 21, 2024 9:25:54 PM To: NYCPlanning/data-engineering @.> Cc: Matthew Croswell (DCP) @.>; Mention @.> Subject: [EXTERNAL] Re: [NYCPlanning/data-engineering] Pluto 24v3 (Issue #976)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Forward suspect email to @.**@.> as an attachment (Click the More button, then forward as attachment).

@caseysmithpghhttps://github.com/caseysmithpgh @jackrosackerhttps://github.com/jackrosacker @croswell81https://github.com/croswell81

Not creating a GH issue for QA since we've started this data update before the new process.

PLUTO 24v3 has been re-built and promoted to draft folder under draft/24v3/2-update-dtm-and-correct-units/.

Notes:

— Reply to this email directly, view it on GitHubhttps://github.com/NYCPlanning/data-engineering/issues/976#issuecomment-2303039848, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AICWQSVDFEW26Z2G6XH7EXDZSUAWFAVCNFSM6AAAAABKNNLCSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBTGAZTSOBUHA. You are receiving this because you were mentioned.Message ID: @.***>

sf-dcp commented 1 week ago

wow, I didn't know email reactions get posted here lol @croswell81

croswell81 commented 1 week ago

@fvankrieken I was not aware of that either.

jackrosacker commented 6 days ago

Hey DE, noting that we're still seeing issues with the DTM that was used for this latest draft build of PLUTO. We've flagged this to DOF for troubleshooting, and to Amanda to help determine best next steps for publication, so just keeping you all up to date. The issue is either on the DOF or GIS Team side, so not DE build-related.

caseysmithpgh commented 2 days ago

Hey DE, Matt was able to get a clean copy of the DTM from DOF and/or their consultant. I'll plan to process it first thing tomorrow so y'all can get started with a re-build.

Can you confirm what zoning data was used to build 24v3 draft 2? If it was June, would it be reasonable for draft 3 to be built with July zoning (latest)?

FYI @AmandaDoyle @croswell81

damonmcc commented 1 day ago

@caseysmithpgh the version I'm seeing for zoning data used in 23v3 draft 2 is 20240807 and that seems to be the latest

so draft 3 with also have the latest

caseysmithpgh commented 1 day ago

@damonmcc sounds good. I can confirm 20240807 reflects GIS 20240731 version. Clean DTM files have been published to /staging and /20240826 -- so y'all are good to go for draft 3.

FYI @croswell81 @jackrosacker

sf-dcp commented 1 day ago

Per discussion with @AmandaDoyle, we will manually correct 2020 census tract value for the JFK airport BBL (4142600001) as a part of this PLUTO version (@jackrosacker, this BBL is from your list above). We will continue researching the other BBLs with GRU.

We will let you know when new draft is ready for your review.

cc: @croswell81 @caseysmithpgh

sf-dcp commented 8 hours ago

Hi GIS team, new draft 3-update-dtm-fix-jfk-lot-censustract/ is ready for your review.

In this draft, we used updated DTM data. We also corrected 2020 census tract value for the JFK lot.

@caseysmithpgh @jackrosacker @croswell81