Closed td928 closed 1 year ago
The output looks good to me and it ran on my machine with the latest data - I am a bit confused about what kind of spot checks I should be doing though. I reviewed the corrections_cluster and the final output and intermediate tables as well. Is there a record that you followed through the build to double check the clustering worked as expected?
@td928 the build is failing on my local, I think it might have to do with the edc_id in the edc_dcp_input file, I think you need to change line 86 in the combine.sql to b.edc_id::numeric. There were changes to dev that might need to be merged into this 352-ST_UNION-NULL
@td928 the build is failing on my local, I think it might have to do with the edc_id in the edc_dcp_input file, I think you need to change line 86 in the combine.sql to b.edc_id::numeric. There were changes to dev that might need to be merged into this 352-ST_UNION-NULL
great catch! Hopefully the merge from dev fix it. @mbh329
Something is going on with the project_record_ids
table, can debug on Friday
The build was successful but the projecT_record_ids
on my local build don't always match the new_cluster_ids
in the cluster_comparison table - how concerned should we be about that?
The output looks good! I think this is ready to be merged
two reviewer preferred one required
address #352 which some clusters pretty clearly have spatial overlaps
sql/_project_record_ids.sql
there are only two main
ST_Contains(a.geom, b.intersect_geom)
this additional condition ensures that when the intersection geometry is entirely within the cluster geometries themselves they will still be grouped up to be in clusters. See below for the comparisons between the clusters produced from before and after the condition is added. clusters_comparision.csv
cluster id
The cluster id is added in two places to ensure that first the
ST_Intersections
would only be performed between projects that were first already identified bydbscan
. This is implemented after some behaviors from the query were observed that when it is run on a single cluster it would behave correctly but when more records (multiple clusters) were brought in to process at once then. My intuition is that whenST_Intersectiosn
were between project outside their previously identified the cluster bydbscan
it created extra geometries that causes toST_Overlap
andST_Contains
to be confused and breaking up some clusters. Again, spot checking the clusters comparisons before and after should be illuminating in that which of the clusters should exist but didn't in the previous implementation.