NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

20Q2 - QAQC Master Issue #106

Closed mgraber closed 4 years ago

mgraber commented 4 years ago

QAQC table

Export alphabetically

qaqc_init.sql ( issue #104, PR #114 closed)

qaqc_units.sql (PR #117 #118 closed)

qaqc_status.sql (PR #122 closed)

qaqc_mid.sql (PR #118 closed and PR #124 open)

Revision:

add this -> occ_init or occ_prop contains "hotel", "assisted", "incapacitated", "restrained", "dormitories" remove this -> job_type = Alteration and occ_initial contains “residential” and occ_proposed contains “hotel” OR remove this -> job_type = Alteration and occ_initial contains “hotel” and occ_proposed contains “residential”

Supplemental QAQC tables

~Potential duplicates~ REMOVE

job_number_a job_number_b equal_units geo_bbl address
102133870 102285233 1 1019887501 519 WEST 135 STREET
... ... ... ... ...

New building and demolition overlap

job_number_dem job_number_nb geo_bbl
102037172 102037305 1021710036
... ... ...

Spatial QAQC #157

QAQC visualization

See issue #33

kschmidtDCP commented 4 years ago

@mgraber Here are HED's preferred field names. This list is provided in the same order as above, but the final output should have the fields listed in alphabetical order so the same group appears together.

In the QAQC document HED had a question about the function of the "co_latest_units is negative" test that appeared in the old qc_outlier code. Can DE explain? Is this checking to see if there are negative values listed on a CO?

levysamu commented 4 years ago

I think I am personally fine with EDM removing the BISTEST records automatically. We put it in because it was in the QAQC checklist that we had been working off of. If this is already taken care of, however, I don't feel much need to look into it further.

SPTKL commented 4 years ago

I think I am personally fine with EDM removing the BISTEST records automatically. We put it in because it was in the QAQC checklist that we had been working off of. If this is already taken care of, however, I don't feel much need to look into it further.

@levysamu we remove them and record the records in the research table. this time around we can instead recording them in the qaqc table

AmandaDoyle commented 4 years ago

@SPTKL @levysamu I think that we should continue to record the records in the QAQC table and remove them from the final output

kschmidtDCP commented 4 years ago

@AmandaDoyle @mgraber @SPTKL Small revision on the logic for the b_likely_occ_desc:

add this -> occ_init or occ_prop contains "hotel", "assisted", "incapacitated", "restrained", "dormitories" remove this -> job_type = Alteration and occ_initial contains “residential” and occ_proposed contains “hotel” OR remove this -> job_type = Alteration and occ_initial contains “hotel” and occ_proposed contains “residential” keep ->job_desc contains any of the following words ~* 'Hotel|Motel|Boarding|Hoste|Lodge|UG 5|Group 5|Grp 5|Class B|SRO|Single room|Furnished|Rooming unit|Dorm|Transient|Homeless|Shelter|Group quarter|Beds|Convent|Monastery|Accommodation|Harassment|CNH|Settlement|Halfway|Nursing home|Assisted|'

kschmidtDCP commented 4 years ago

@AmandaDoyle @mgraber @SPTKL Here are the additional spatial checks we would like to perform:

kschmidtDCP commented 4 years ago

We have a new idea for how to flag duplicates that we'd like to propose, given the request that we present potential duplicates as groups/clusters rather than pairwise comparisons. Rather than having one column for equal unit matches and another for different unit matches, we would like to propose:

duplicates with equal units and duplicates regardless of units

@AmandaDoyle @mgraber @SPTKL This is a great idea. Please do this!

kschmidtDCP commented 4 years ago

Can you sign off on the New building and demolition overlap QAQC table schema under "Supplemental QAQC tables" in #106?

@AmandaDoyle @mgraber @SPTKL Signed off!

levysamu commented 4 years ago

When finding potential duplicates, do we treat two records that both have NULL units as an "equal units" match? What about two records that share an address but both have NULL geo_bbl? (#106)

If the units are null, do not mark them as having equal units. If the addresses are the same, but the BBL is NULL, mark them as the same address.