Closed damonmcc closed 1 year ago
Not sure if this is related to the core build issue, but I'm noticing some potentially incorrect source dataset dates in the source_data_versions.csv and similarly in the ZTL QC app
The dates I think we should see here are:
Ok, I was being dumb. I hadn't run the Dataloading action prior to the Build action during my previous attempts. I just re-uploaded the latest dof-dtm file to Digital Ocean and ran the Dataloading action which is unfortunately failing
The input dof-dtm file used for the action above is from the 2023-04-28 geodatabase, and had a scripted Check Geometry, Repair Geometry, Check Geometry scripted sequence applied to it. The Repair Geometry tool was set to retain null geometries, and use the Esri method over the OGC method
Replaced Tax Lot Polygon file on Digital Ocean with an un-repaired version of the 2023-04-28 dataset (same input dataset as from previous run, but I skipped the Check/Repair geometry steps), re-ran Dataloading action. Action failed on dof-dtm dataset.
Wondering which of the other dof-dtm files I uploaded to DO may also impact the dof-dtm file during the Dataloading action? My assumption so far is that the Tax Lot Polygon feature class (dof_dtm_tax_lot_polygon.zip on DO) is the primary constituent of the file listed in the actions above
Re-ran Dataloading step, another failure.
This time the input dof-dtm data was all prepared by hand according to the instructions in the Open Data Cycle Docs. The Tax Lot Polygon was exported to shapefile, and was did not have Repair Geometry run. Raw zipped shapefile was uploaded to DO. Dataloading still failing on this dataset.
Next steps: keep all other dof-dtm datasets as-is on DO (manually processed data currently on DO), and replace dof_dtm_tax_lot_poly DO dataset with a version that has had Repair Geometry run on it. Consider running the tool repeatedly until no geometry errors persist.
@damonmcc this still may be due to errors on my end, but is it self-evident to you from the Dataloading logs where in the process the dof-dtm loading is failing?
One small thought on this and other gdal failures - if we haven't done this before we can use gdal.SetConfigOption('CPL_DEBUG', 'ON')
to report debug messages to std output.
Attempting manual check/repair geom, and then Dataloading. Check Geom on original dataset:
arcpy.management.CheckGeometry(
r"source\export_20230428.gdb\Cadastral\Tax_Lot_Polygon",
r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\taxLotPoly_20230428_CheckGeom_original_dataset",
"ESRI"
)
Results:
Error type | Count | Notes |
---|---|---|
could not find spatial index | 1 | |
null geometry | 5 | |
self intersections | 421 | |
Total | 427 |
Moved original dataset to staging geodatabase, ran Check Geom on dataset now located on staging geodatabase (should at least remove the sp. index error)
arcpy.management.CheckGeometry(
r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\export_20230428_gdb_Tax_Lot_Polygon",
r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\taxLotPoly_20230428_CheckGeom_staging_gdb_dataset",
"ESRI"
)
Results:
Error type | Count | Notes |
---|---|---|
null geometry | 5 | |
self intersections | 421 | |
Total | 426 |
Ran Repair Geometry on dataset in staging geodatabase (first repair)
arcpy.management.RepairGeometry(
r"dev\gis-dof-dtm\gis\dof-dtm-processing\dof-dtm-processing.gdb\export_20230428_gdb_Tax_Lot_Polygon",
"KEEP_NULL",
"ESRI"
)
Ran Check Geometry on dataset in staging geodatabase that has been repaired once
same as last Check Geometry run
Results:
Error type | Count | Notes |
---|---|---|
null geometry | 14 | Note increase from prior run |
self intersections | 1 | Note presence of error |
Total | 15 |
Exported once repaired dataset to shapefile Uploaded to Digital Ocean Ran Dataloading action - failure
Ran Repair Geometry again on dataset in staging geodatabase (second repair)
same as previous Repair Geometry run
Ran Check Geometry on dataset in staging geodatabase that has been repaired twice
same as last Check Geometry run
Results:
Error type | Count | Notes |
---|---|---|
null geometry | 14 | |
self intersections | 1 | |
Total | 15 |
Since error count is the same as from the last run, I'm not running the Dataloading action again with the twice-repaired dataset.
@damonmcc
Tested the Dataloading step by using the last successful input dataset from 2023-03-31
Moved the 2023-03-31 dataset to the staging folder in DO
Ran the Dataloading action successfully with that dataset in the staging folder
Running Check Geometry on the 2023-03-31 dof-dtm shapefile downloaded from DO resulted in:
Error type | Count | Notes |
---|---|---|
null geometry | 15 | |
Total | 15 |
Note that there are no self intersection errors in this dataset, so potentially this was the change introduced by DOF in the last month.
notes from investigating/brainstorming with @jackrosacker yesterday:
(no geometry means shape_area = 0
and shape_length = 0
)
@croswell81 has approved short term fix.
Next steps are to apply feature merge, re-run repair geom tool, and attempt Dataloading action. I'll also contact DOF about this specific polygon if it proves to be the final sticking point for an automated process.
Edit: For reference, the BBL of the problem feature is 4158250007
Dataloading step still failing after two separate attempts with two different dof-dtm conditions:
At this point the remaining paths forward I can see are to:
@caseysmithpgh I tried deleting the sticky self intersecting polygon and re-running the Dataloading action with no luck. Can you process Tax Lot Poly and run the action on Monday if you have time? I'd be happy to look over your shoulder and look for differences as well.
I'll also try the repair > import > Dataloading sequence on the newest dataset in export_20230505.gdb on Monday as well.
Tried running Dataloading w. more recent 2023-05-05 dof-dtm, without running Repair Geometry. Failed run.
Tried running Dataloading w. more recent 2023-05-05 dof-dtm with running Repair Geometry. Failed run
hey @jackrosacker. it's too bad the failed runs don't give any details on why they fail. but they're all identical and seem to be the same as the original failures last month due to at least one invalid/null geometry
for the dataloading run 3 days ago, you mentioned using a file in which you deleted the sticky self-intersecting polygon. in that file, were there any other remaining self-intersecting, null, or otherwise invalid geometries?
if so, it seems worth manually removing all invalid geometries using ArcGIS Pro to make it as easy as possible for this gdal python function to succeed during dataloading
Morning @damonmcc. For the run in which I deleted the un-repairable geometry, I tried it with the 14 persistent null geometries present, and then I tried again after having deleted them, both resulting in failed runs. @caseysmithpgh is going to try prepping the data and running the action with the 4/28 data to see if I'm making some hidden error with the data prep stage.
if that fails, I can try manually doing the steps that dataloading is intended to do. I believe they're just:
Had success with 4.28 version
To highlight the key workflow difference: the Dataloading action is successful when Repair Geometry is run on the output shapefile, but is not successful when run on the parent feature class 🤦
hey @jackrosacker, sorry I feel like we talked about it but is this working now?
@damonmcc it is working, yes. Forgot to close this out.
Basically, the Dataloading action fails if Repair Geometry is not run, or if it is run on the feature class prior to exporting to shapefile. In order for the Dataloading action to accept the input DTM, Repair Geometry has to be run on the shapefile after it has been exported from feature class. Reasons unclear, but that has been the consistent result after a few runs by both me and Casey.
The scripted process has been updated to reflect this order of operations, so we'll revisit next Open Data cycle.
notes
differences in
qaqc
section of rungood
bad