NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

HNY Join #580

Closed td928 closed 1 year ago

td928 commented 1 year ago

553 two reviewers preferred 🐠 🐟

Overview

the needs from the Housing team to be able to link DevDB and HNY records were not satisfied with last iteration with the HNY_devdb table. After initially trying to make the table HNY_devdb to fit with the requests, I opted to create a new HNY_devdb and rename the original table to DevDB_HNY.

The new HNY_devdbtable focuses on including the full list of HNY records which associate with DevDB records where as the original and new DevDB_HNY does not do this. It additionally do the job of aggregating the hny records when there are many hny records associated with a single devdb record. It however does not do any aggregation on one-devdb-to-many-hny-records or many-to-many relationship but simply present them as fully as possible.

_hny_matches.sql and _hny_join.sql

_hny_matches.sql remain largely unchanged and is simple rename from the _hny.sql. I made this change because I feel like the script is very long and the heavy lifting is really the HNY_matches table that got created in this process which the new _hny_join.sql also relies on.

The purpose of _hny_join.sql is really to include the information from the source HNY and makes its relationship with devdb records as legible as possible and resolve many to one devdb relationships when possible.

Review tips

I think the final success of the PR is going to depend a little bit on the feedbacks from Housing. I imagine we are going to run some output from this feature branch after approval and send it to housing team. Then if they also like what we produce then we will merge this in. So now I think the review could focus on simply whether this new pipeline works and it produces what I described above and should be sufficient.

SashaWeinstein commented 1 year ago

My first comment before I look at the code is that the names of HNY_devdb and DevDB_hny can probably be improved. Not sure about what to rename them to but think this is important change to be able to merge

SashaWeinstein commented 1 year ago

I ran queries like

SELECT * from HNY_devdb where one_dev_to_many_hny = 0 and one_hny_to_many_dev=0 ORDER by job_number;

and tried each combination of one dev to many hny and many hny to one dev and got expected output each time

mbh329 commented 1 year ago

Agree @SashaWeinstein that the names of HNY_devdb and DevDB_hny should be improved

mbh329 commented 1 year ago

Other than the table names and sasha's comments. I don't have much to add, the code ran as expected and the QA checks looked good

td928 commented 1 year ago

My first comment before I look at the code is that the names of HNY_devdb and DevDB_hny can probably be improved. Not sure about what to rename them to but think this is important change to be able to merge

yeah not my favorite here either. I use the convention whichever is the primary identifier comes first in the joined table. Kind of out of ideas at the moment. Any suggestions? @SashaWeinstein @mbh329

mbh329 commented 1 year ago

Would some sort of definitive suffix help here? _many2one for the newest table? I always find this part of devdb to one of the most confusing. if we can't settle on a name, maybe we can open an issue and come back to it

SashaWeinstein commented 1 year ago

Maybe a name for what is currently HNY_devdb that describes it's intended use? It's for housing's manual research right? So jobs_for_hny_research would differentiate from the tables used to construct devDB

td928 commented 1 year ago

Maybe a name for what is currently HNY_devdb that describes it's intended use? It's for housing's manual research right? So jobs_for_hny_research would differentiate from the tables used to construct devDB

Not quite. This is table is what Housing is hoping to let them look up devdb with HNY records. If HNY_devdb_join is also too non-descriptive, maybe we could go with HNY_devdb_lookup?

SashaWeinstein commented 1 year ago

Ok cool so to confirm, the new table would be HNY_devdb_lookup right? Could we rename the old table to DevDB_with_HNY?

AmandaDoyle commented 1 year ago

To recap the conversation based on my understanding.

SashaWeinstein commented 1 year ago

Hey @td928 want to flag that this is still outstanding. Think it's about ready to be merged in

td928 commented 1 year ago

Hey @td928 want to flag that this is still outstanding. Think it's about ready to be merged in

thanks for the flag. Added to the docstrings

td928 commented 1 year ago

@mbh329 @AmandaDoyle final major enhancement to be merged into dev. All the other enhancements are already merged into this feature branch to avoid conflicts. I tested this branch locally and output looks good to me. Let me know if there are any questions about this.

Thanks!

mbh329 commented 1 year ago

The build ran on my machine successfully - the tables look to be what we want but I will do a quick check tomorrow morning with fresh eyes. Otherwise looks good to me (besides some outstanding comments about documentation)

mbh329 commented 1 year ago

@td928 do you want to chat about the comments? The tables look good