NYCPlanning / db-knownprojects

KPDB: A compilation of prospective residential development projects from various sources, with rough projections of new unit counts
https://nycplanning.github.io/db-knownprojects
0 stars 0 forks source link

Clusters Got Broken Up Unexpectedly #372

Closed td928 closed 1 year ago

td928 commented 1 year ago

two reviewer preferred one required

address #352 which some clusters pretty clearly have spatial overlaps

sql/_project_record_ids.sql

there are only two main

ST_Contains(a.geom, b.intersect_geom)

this additional condition ensures that when the intersection geometry is entirely within the cluster geometries themselves they will still be grouped up to be in clusters. See below for the comparisons between the clusters produced from before and after the condition is added. clusters_comparision.csv

cluster id

The cluster id is added in two places to ensure that first the ST_Intersections would only be performed between projects that were first already identified by dbscan. This is implemented after some behaviors from the query were observed that when it is run on a single cluster it would behave correctly but when more records (multiple clusters) were brought in to process at once then. My intuition is that when ST_Intersectiosn were between project outside their previously identified the cluster by dbscan it created extra geometries that causes to ST_Overlap and ST_Contains to be confused and breaking up some clusters. Again, spot checking the clusters comparisons before and after should be illuminating in that which of the clusters should exist but didn't in the previous implementation.

mbh329 commented 1 year ago

The output looks good to me and it ran on my machine with the latest data - I am a bit confused about what kind of spot checks I should be doing though. I reviewed the corrections_cluster and the final output and intermediate tables as well. Is there a record that you followed through the build to double check the clustering worked as expected?

mbh329 commented 1 year ago

@td928 the build is failing on my local, I think it might have to do with the edc_id in the edc_dcp_input file, I think you need to change line 86 in the combine.sql to b.edc_id::numeric. There were changes to dev that might need to be merged into this 352-ST_UNION-NULL

td928 commented 1 year ago

@td928 the build is failing on my local, I think it might have to do with the edc_id in the edc_dcp_input file, I think you need to change line 86 in the combine.sql to b.edc_id::numeric. There were changes to dev that might need to be merged into this 352-ST_UNION-NULL

great catch! Hopefully the merge from dev fix it. @mbh329

mbh329 commented 1 year ago

Something is going on with the project_record_ids table, can debug on Friday

mbh329 commented 1 year ago

The build was successful but the projecT_record_ids on my local build don't always match the new_cluster_ids in the cluster_comparison table - how concerned should we be about that?

mbh329 commented 1 year ago

The output looks good! I think this is ready to be merged