NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

Null date handling issues with gdal #508

Closed fvankrieken closed 5 months ago

fvankrieken commented 5 months ago

Latest zoning map amendments has an invalid date which is causing errors in Pluto/ZTL builds

fvankrieken commented 5 months ago

@jackrosacker

fvankrieken commented 5 months ago

dof_shoreline has same issue. Not sure if something has gone wrong with our data library step or something on the source data side. Wonder if all "ztl dataloading" datasets are affected. Going to investigate more tomorrow

jackrosacker commented 5 months ago

Hmm, that's peculiar. I'm seeing the error message that references the "effective" column, but not seeing any non-conforming date values when I open it in GIS. The column does have a bunch of null values but that is always the case. Odd that it's throwing an error now, but ran fine last week. We sent the zoning data back to TRD for edits, but Andrew only updated the zoning districts file. Also noteworthy that the shoreline dataset hasn't changed since 2022, so not sure why that's erroring.

I'll take another look at the source data tomorrow, and let's touch base when you've had a chance to review.

fvankrieken commented 5 months ago

And the source data for dof_shoreline (datasets/dof_dtm_shoreline_polygon) hasn't changed since '22, so this seems like a change on our end. We recently bumped gdal from 3.6 to 3.8, I wonder if it's handling something differently

jackrosacker commented 5 months ago

Yeah that could be it. Maybe new gdal is stumbling over NULL values in the "effective" column without explicit handling? Can we drop gdal back a version so we can get ZTL built and QA'd while your testing happens?

fvankrieken commented 5 months ago

Would it be okay for me see if we can get a fix today for this? And then either resolve it today, or revert so we can get ZTL built EOD today/Tuesday. Let me know if it'd be better for y'all to just have ztl built asap

fvankrieken commented 5 months ago

It seems pretty clearly to be how GDAL is handling NULL datetimes. dof_shoreline has 1 feature and 5 fields or so, and I've verified the datetimes are null (at least while importing the shp from edm-publishing into qgis). The data library pgdump from 5 days ago has Nulls as it should, while the new one has "0000/00/00".

Hopefully there's some sort of flag we can use at export for this

jackrosacker commented 5 months ago

Would it be okay for me see if we can get a fix today for this? And then either resolve it today, or revert so we can get ZTL built EOD today/Tuesday. Let me know if it'd be better for y'all to just have ztl built asap

Go for it! A proper fix would be great. I would ideally want to be finishing QA on Tuesday if at all possible though. Had a few hiccups with the origin data and want to wrap up this phase of Open Data soon. Thanks for flagging the failed build and jumping on this so quickly.

fvankrieken commented 5 months ago

It's been a frustrating day. Rerunning ztl dataloading action (with ztl build) with gdal 3.6.2 now. Draft name for ztl output will be fvk-downgrade-gdal

jackrosacker commented 5 months ago

Sounds good, thanks @fvankrieken. Looks like the build succeeded on your downgrade branch, so I'll run QA from that

jackrosacker commented 5 months ago

Just a heads up that we may have to re-run the build again on Wed or so next week - if that impacts your troubleshooting let me know

fvankrieken commented 5 months ago

All good - think I have this fixed moving forward, see #513 if you're interested in the specifics