NYCPlanning / db-zoningtaxlots

Zoning Tax Lot Database (ZTLDB) -- associates zoning designations with lots from the Department of Finance Digital Tax Map
https://edm-data-engineering.nycplanningdigital.com/?page=Zoning+Tax+Lots
0 stars 0 forks source link

build is failing #134

Closed damonmcc closed 1 year ago

damonmcc commented 1 year ago

notes

CREATE TABLE
psql:sql/qaqc/in_bbldiffs.sql:38: NOTICE:  table "qc_bbldiffs" does not exist, skipping
COPY 1
DROP TABLE
SELECT 0
jackrosacker commented 1 year ago

Not sure if this is related to the core build issue, but I'm noticing some potentially incorrect source dataset dates in the source_data_versions.csv and similarly in the ZTL QC app image

The dates I think we should see here are:

jackrosacker commented 1 year ago

Ok, I was being dumb. I hadn't run the Dataloading action prior to the Build action during my previous attempts. I just re-uploaded the latest dof-dtm file to Digital Ocean and ran the Dataloading action which is unfortunately failing

The input dof-dtm file used for the action above is from the 2023-04-28 geodatabase, and had a scripted Check Geometry, Repair Geometry, Check Geometry scripted sequence applied to it. The Repair Geometry tool was set to retain null geometries, and use the Esri method over the OGC method

jackrosacker commented 1 year ago

Replaced Tax Lot Polygon file on Digital Ocean with an un-repaired version of the 2023-04-28 dataset (same input dataset as from previous run, but I skipped the Check/Repair geometry steps), re-ran Dataloading action. Action failed on dof-dtm dataset.

Wondering which of the other dof-dtm files I uploaded to DO may also impact the dof-dtm file during the Dataloading action? My assumption so far is that the Tax Lot Polygon feature class (dof_dtm_tax_lot_polygon.zip on DO) is the primary constituent of the file listed in the actions above

jackrosacker commented 1 year ago

Re-ran Dataloading step, another failure.

This time the input dof-dtm data was all prepared by hand according to the instructions in the Open Data Cycle Docs. The Tax Lot Polygon was exported to shapefile, and was did not have Repair Geometry run. Raw zipped shapefile was uploaded to DO. Dataloading still failing on this dataset.

Next steps: keep all other dof-dtm datasets as-is on DO (manually processed data currently on DO), and replace dof_dtm_tax_lot_poly DO dataset with a version that has had Repair Geometry run on it. Consider running the tool repeatedly until no geometry errors persist.

jackrosacker commented 1 year ago

@damonmcc this still may be due to errors on my end, but is it self-evident to you from the Dataloading logs where in the process the dof-dtm loading is failing?

fvankrieken commented 1 year ago

One small thought on this and other gdal failures - if we haven't done this before we can use gdal.SetConfigOption('CPL_DEBUG', 'ON') to report debug messages to std output.

jackrosacker commented 1 year ago

Attempting manual check/repair geom, and then Dataloading. Check Geom on original dataset:

arcpy.management.CheckGeometry(
    r"source\export_20230428.gdb\Cadastral\Tax_Lot_Polygon", 
    r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\taxLotPoly_20230428_CheckGeom_original_dataset",
    "ESRI"
)

Results:

Error type Count Notes
could not find spatial index 1
null geometry 5
self intersections 421
Total 427

Moved original dataset to staging geodatabase, ran Check Geom on dataset now located on staging geodatabase (should at least remove the sp. index error)

arcpy.management.CheckGeometry(
    r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\export_20230428_gdb_Tax_Lot_Polygon",
    r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\taxLotPoly_20230428_CheckGeom_staging_gdb_dataset",
    "ESRI"
)

Results:

Error type Count Notes
null geometry 5
self intersections 421
Total 426

Ran Repair Geometry on dataset in staging geodatabase (first repair)

arcpy.management.RepairGeometry(
    r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\export_20230428_gdb_Tax_Lot_Polygon",
    "KEEP_NULL",
    "ESRI"
)

Ran Check Geometry on dataset in staging geodatabase that has been repaired once

same as last Check Geometry run

Results:

Error type Count Notes
null geometry 14 Note increase from prior run
self intersections 1 Note presence of error
Total 15

Exported once repaired dataset to shapefile Uploaded to Digital Ocean Ran Dataloading action - failure


Ran Repair Geometry again on dataset in staging geodatabase (second repair)

same as previous Repair Geometry run

Ran Check Geometry on dataset in staging geodatabase that has been repaired twice

same as last Check Geometry run

Results:

Error type Count Notes
null geometry 14
self intersections 1
Total 15

Since error count is the same as from the last run, I'm not running the Dataloading action again with the twice-repaired dataset.


jackrosacker commented 1 year ago

@damonmcc

Tested the Dataloading step by using the last successful input dataset from 2023-03-31

Moved the 2023-03-31 dataset to the staging folder in DO

Ran the Dataloading action successfully with that dataset in the staging folder

Running Check Geometry on the 2023-03-31 dof-dtm shapefile downloaded from DO resulted in:

Error type Count Notes
null geometry 15
Total 15

Note that there are no self intersection errors in this dataset, so potentially this was the change introduced by DOF in the last month.

damonmcc commented 1 year ago

notes from investigating/brainstorming with @jackrosacker yesterday:

(no geometry means shape_area = 0 and shape_length = 0)

jackrosacker commented 1 year ago

@croswell81 has approved short term fix.

Next steps are to apply feature merge, re-run repair geom tool, and attempt Dataloading action. I'll also contact DOF about this specific polygon if it proves to be the final sticking point for an automated process.

Edit: For reference, the BBL of the problem feature is 4158250007

jackrosacker commented 1 year ago

Dataloading step still failing after two separate attempts with two different dof-dtm conditions:

  1. Manually merging BBL 4158250007 with parent and running Repair Geometry, and running Dataloading action, then
  2. Taking same SHP as above and additionally deleting 14 features with null geometry, leaving no remaining geometry errors (per Esri standard) and running Dataloading action

At this point the remaining paths forward I can see are to:

  1. Ask @caseysmithpgh to manually process/upload the dof-dtm file and re-run the action in case there is case there is an unidentified part of the process that I am doing differently, AND
  2. Ask @damonmcc to add a more verbose logging function to the Dataloading action to see if we can point more directly at what is going wrong
jackrosacker commented 1 year ago

@caseysmithpgh I tried deleting the sticky self intersecting polygon and re-running the Dataloading action with no luck. Can you process Tax Lot Poly and run the action on Monday if you have time? I'd be happy to look over your shoulder and look for differences as well.

I'll also try the repair > import > Dataloading sequence on the newest dataset in export_20230505.gdb on Monday as well.

jackrosacker commented 1 year ago

Tried running Dataloading w. more recent 2023-05-05 dof-dtm, without running Repair Geometry. Failed run.

Tried running Dataloading w. more recent 2023-05-05 dof-dtm with running Repair Geometry. Failed run

damonmcc commented 1 year ago

hey @jackrosacker. it's too bad the failed runs don't give any details on why they fail. but they're all identical and seem to be the same as the original failures last month due to at least one invalid/null geometry

for the dataloading run 3 days ago, you mentioned using a file in which you deleted the sticky self-intersecting polygon. in that file, were there any other remaining self-intersecting, null, or otherwise invalid geometries?

if so, it seems worth manually removing all invalid geometries using ArcGIS Pro to make it as easy as possible for this gdal python function to succeed during dataloading

jackrosacker commented 1 year ago

Morning @damonmcc. For the run in which I deleted the un-repairable geometry, I tried it with the 14 persistent null geometries present, and then I tried again after having deleted them, both resulting in failed runs. @caseysmithpgh is going to try prepping the data and running the action with the 4/28 data to see if I'm making some hidden error with the data prep stage.

damonmcc commented 1 year ago

if that fails, I can try manually doing the steps that dataloading is intended to do. I believe they're just:

  1. reproject the geometry
  2. upload to digital ocean
caseysmithpgh commented 1 year ago

Had success with 4.28 version

jackrosacker commented 1 year ago

To highlight the key workflow difference: the Dataloading action is successful when Repair Geometry is run on the output shapefile, but is not successful when run on the parent feature class 🤦

damonmcc commented 1 year ago

hey @jackrosacker, sorry I feel like we talked about it but is this working now?

jackrosacker commented 1 year ago

@damonmcc it is working, yes. Forgot to close this out.

Basically, the Dataloading action fails if Repair Geometry is not run, or if it is run on the feature class prior to exporting to shapefile. In order for the Dataloading action to accept the input DTM, Repair Geometry has to be run on the shapefile after it has been exported from feature class. Reasons unclear, but that has been the consistent result after a few runs by both me and Casey.

The scripted process has been updated to reflect this order of operations, so we'll revisit next Open Data cycle.