NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

Do not take geometries from million bins #489

Closed AmandaDoyle closed 2 years ago

AmandaDoyle commented 2 years ago

Add clause here and here in _geo.sq to that million bins cannot join to each other adding WHERE b.bin IS NOT LIKE%000000` after each ON statement should do the trick

mbh329 commented 2 years ago

Seems like dropping the million bins got dropped in an earlier refactor https://github.com/NYCPlanning/db-developments/blob/6bba931b1ef661f9cb8500fcd86584752121b4e6/sql/_geo.sql#L72

SashaWeinstein commented 2 years ago

@AmandaDoyle what are the geo_bins in the _GEO_devdb table? Not essential to solving this problem but we are curious

SashaWeinstein commented 2 years ago

Our fix to filter out million bins in _geo.sql has introduced a new error in _hny.sql.

We get this error when trying to build via 02_build_devdb.sh psql:sql/_hny.sql:431: ERROR: invalid input syntax for type integer: "B00558782"

Looking into the code we see that the error is thrown when the many_to_one CTE is unioned to one_to_many. It seems to be a data type issue. First we thought the issue was in the union of one_to_many and many_to_one. But the error is thrown any time many_to_one is queried so it is actually in the creation of many_to_one.

Our solution was based off the observation that the HNY_matches table is split into one_to_one, one_to_many, and many_to_one CTE. So we thought that sending job_number to text when HNY_matches was created was the cleanest way to resolve.

That didn't work so we tried casting job numbers in the creation of many_to_one. However that didn't work either so maybe our original understanding of the problem was wrong. Or the difficulty lies with our lack of SQL proficiency. Either way we have not made much progress on this issue.

AmandaDoyle commented 2 years ago

@SashaWeinstein can you write out what your solution was to filter out million bins in _geo.sql? Similar to how I wrote out my solution in the initial comment.

SashaWeinstein commented 2 years ago

Addressed in pull request 490