NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

20Q2 - Questions to HED #100

Closed AmandaDoyle closed 4 years ago

AmandaDoyle commented 4 years ago

Questions 2020/07/06 - 2020/07/10

Archived Questions

2020/06/29 - 2020/07/02 - [x] For units_co_prop_mismatch do you really need us to map `C -CO` -> `FCO` and `T- TCO` -> `TCO`? We recommend just outputting `C -CO` and `T- TCO` so that certificate types are consistent across tables. #224 - [x] Can you confirm the [logic](https://github.com/NYCPlanning/db-developments/issues/203#issuecomment-651218061) for the b_nonres_with_units QAQC check? #203 - [x] Need feedback from HED on classa_incmpl - [x] Does each census block fall within a single community district and a single council district? If so, where can we find a lookup for this relationship? #213 - [x] After reviewing the logic for populating `enlargement` I think [the logic implemented](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping#enlargement) is correct compared to what you outlined in the document. Would you like the values that are mapped updated (i.e. Horizontal' instead of 'Enlrg_Horiz')? 2020/06/22 - 2020/06/27 - [x] Given that `nonres_flag` and `resid_flag` are supposed to identify something as Mixed Use should the [logic](https://github.com/NYCPlanning/db-developments/blob/74aaf2b1df5f74b2b6383f8158b65312b180f698/developments_build/sql/_function.sql#L283) of only assigning `nonres_flag` if `_resid_flag` is NULL still be applied? #49 - [x] Can you sign off on the [logic](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping#job_inactive) of flagging something as inactive? - [x] Should the date we compare to to flag jobs as inactive be a static date (i.e. July 1, 2020) or the running date of when the script is run? #164 - [x] Can you confirm that `classa_complt` should be the `numofdwellingunits` associated with the latest final CO if there is one or the latest T-CO? #177 - [x] Should `stories_prop` only be populated for NB and A1 and NULL for DM? - [x] Can you review the [final logic](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping) for all fields and let us know if anything is inconsistent with your understanding? - [x] `dcpedited` is currently formatted as an array. How would like `dcpedited` to be formatted? - [x] Can you confirm that the final outputs in issue #186 are correct and comprehensive? - [x] Can you clarify the logic for matching on units for flagging jobs as inactive [here](https://github.com/NYCPlanning/db-developments/issues/195#issuecomment-650167394) #195 ? - [ ] ~~What file from the M drive should we use to get `depdrainarea` and `deppumpstatn`? #52~~ Did not receive files in time. - [x] Can we get confirmation about the logic to assign status codes from dates? #160 - [x] **Addressed in Teams** How do we assign status values? Using DOB status field plus dates depending on the job type, or just dates. #150
2020/06/15 - 2020/06/19 - [x] Describe the Lon Lat Corrections process and rationale #134 - [x] When finding potential duplicates, do we treat two records that both have NULL units as an "equal units" match? What about two records that share an address but both have NULL geo_bbl? ( #106 ) - [x] Can you sign off on the New building and demolition overlap QAQC table schema under "Supplemental QAQC tables" in #106? - [x] Also in #106 We have a new idea for how to flag duplicates that we'd like to propose, given the request that we present potential duplicates as groups/clusters rather than pairwise comparisons. Rather than having one column for equal unit matches and another for different unit matches, we would like to propose: 1) duplicates with equal units and 2) duplicates regardless of units For example, imagine three records sharing a bbl and address, 2 have equal units and one has different units. With the proposed method you'd have two columns containing the shared bbl & address (a stable alternative to a cluster id). Duplicates with equal units would be grouping together the two records, and duplicates regardless of units would be grouping together the three records. Here is another example. In this case, each of these job_numbers has a match with equal units, as well as two matches with different units. Sorting by the `dup_bbl_address` column allows you see groups of potential duplicates, regardless of units. If you instead only want to focus your attention on potential duplicates that share units, you would group by `dup_bbl_address_units` Screen Shot 2020-06-19 at 8 20 52 AM - [x] Based on your email the files in #58 are not the correct files correct,and we're waiting for updated data? - [x] What should the qaqc field be for `dem_nb_overlap` (temporary name)? It indicates spatial overlap between new buildings and demolitions, as requested in #57 (PR #124). Answer: dem_nb_overlap - [x] Should we include CO fields to power app? if so what to name them? #107 - [x] How should we assign the `geom` for records that have address in `TPAD`? Should the Lat/Long come from GeoSupport via the address or should it come by joining the DOB BIN or BBL to building footprints or MapPLUTO? The rest of the spatial data attributes are coming from spatial joins. #63 #74 #79 GRU says that if the record is in TPAD, get the point geometry by taking the centroid from the clipped MapPLUTO BBL - [x] Did you follow-up with DOB about zoning sqft data? #58 - [x] Are there particular field names you would like for each boolean flag in the QAQC checks? #106 - [x] Does "Make status Complete for demolition if it has a value in Date_StatusX" #40 overwrite previous logic in assigning `5. complete` for demolitions (or do we add ^ in addition to previous logic) ? preivous logic: ```sql WHEN a.job_type = 'Demolition' AND status_translate(a._status) IN ('5. Complete','3. Permited') THEN '5. Complete' ``` - [x] Final sign-off on schema as listed in #52, which we've updated to be in sync with the spreadsheet and to reflect that capitalization changes cannot be applied - [x] Is `Nonresid_flag` an intuitive name for describing a building with both residential and nonresidential functions? So a mixed used building would be a building that has both a true for `resid_flag` and `Nonresid_flag`? So `Nonresid_flag` isn't really replacing mixed use, it's the function of the two fields that are replacing the mixed use flag? #52 - [x] One of the QAQC checks is for BISTEST. We already have [code](https://github.com/NYCPlanning/db-developments/blob/637f7391a457037223f5ba48b17dab36202442ee/developments_build/sql/init.sql#L138-L148) to put these into the research table these records programatically by dumping them as 'remove' in the corrections file. Is the QAQC check to find BISTEST records in the raw data (before we remove them) or to find any records that got by our automatic removal? Should we not be taking these records out automatically? #106 - [x] Similarly, we remove duplicates programatically. Should we not automatically remove them? If we keep the [removal step](https://github.com/NYCPlanning/db-developments/blob/637f7391a457037223f5ba48b17dab36202442ee/developments_build/sql/init.sql#L107-L129), are the duplicate QAQC checks to be done before or after removal? #31 #106 - [ ] ~~Do you have the logic on how to add fields from Furman Center Core dataset? #48 It says HED input was given, but I don't see it.~~ (Not executing in this enhancement period) - [x] Furman field names are inconsistent between schema spreadsheet and issue #48. Which do we use? We should clarify this in the schema issue. #52 - [ ] ~~Do you have the logic on how to add fields from DHCR dataset? #47 It says HED input was given, but I don't see it.~~ (Not executing in this enhancement period) - [x] Previously we used occupancy initial and proposed fields to assign the resid_flag value; however, based on new logic you only want to assign something as residential if the record has a value in one of the 6 unit fields. Does this make sense? How will non residential units be captured or removed? - [x] Related to the previous question. Previously, we set `classa_init` to 0 if: ```sql job_type='Alteration' AND occ_initial ~* 'hotel' AND occ_proposed ~* 'Residential|Assisted' AND x_mixeduse is null ``` and `classa_prop` to 0 if: ```sql job_type='Alteration' AND occ_initial ~* 'hotel' AND occ_proposed ~* 'Residential|Assisted' AND x_mixeduse is null ``` We are now using `resid_flag` and `nonres_flag` in place of `x_mixeduse`. If we set `resid_flag` and `nonres_flag` based on `classa_init` and `classa_prop`, it does not make sense for us to turn around and use these fields to alter `classa_init` and `classa_prop`. Should we still be automatically setting these cases to zero? Are they accounted for in the QAQC flag `b_likely_occ_desc` (see #106 for definition of this check)?
2020/06/08 - 2020/06/12 ## Field names & schema changes - [x] For field, names should we use the 10 char max field name for the files outputted by data engineering? Please note that we can only output lowercase letters. Related to #52 - [x] Where does the 13 char max field name come into play? - [x] What values should the fields `nonres_flag` and `resid_flag` contain? 'Y' and NULL? Related to #27 - [x] Is the new field `nonres_flag` simply the opposite of `resid_flag`? If so, what is the value of including both? Related to #27 - [x] How is `nonres_flag` set? See [thread in #27](https://github.com/NYCPlanning/db-developments/issues/27#issuecomment-643348226). - [x] need more clarity on **Recode City Owned, Owner Type, Non-Profit into a single field** #64 (need a look up table, or clear logic) ## Corrections and QAQC - [x] #53 How do we correct invalid dates, and who needs to be involved in this conversation? Related to - [x] #26 Do we still have to output jobs where occ_category = Other and units_init/prop <> 0 for research given how we are recoding occ_category? ## Units - [x] What are valid reasons for programmatically changing unit fields as in #69? ## HNY - [x] #105 Where should edits to hny fields occur? - [x] Is there consensus on what is proposed for the HNY matching workflow outlined in #83 ? ## Other - [x] #57 What date parameters should we use for matching NB to DMs? - [x] Check to confirm confirm that we should NULL out any job descriptions that do have any letter characters in #46 - [x] In the new deduplication logic (#31), we filter to jobs that are not inactive. Do _both_ of the potential duplicate jobs need to have `x_inactive IS NULL` to be output to the QAQC table, or only one job?
kschmidtDCP commented 4 years ago

Is the new field nonres_flag simply the opposite of resid_flag? If so, what is the value of including both? Related to #27

These are not mutually exclusive, so if a building contains both residential and nonresidential uses (like retail) both flags would be applied.

kschmidtDCP commented 4 years ago

@AmandaDoyle @mgraber @SPTKL

Related to the previous question. Previously, we set classa_init to 0 if: job_type='Alteration'
AND occ_initial ~ 'hotel'
AND occ_proposed ~
'Residential|Assisted'
AND x_mixeduse is null
and classa_prop to 0 if:

job_type='Alteration'
AND occ_initial ~ 'hotel'
AND occ_proposed ~
'Residential|Assisted'
AND x_mixeduse is null

We are now using resid_flag and nonres_flag in place of x_mixeduse. If we set resid_flag and nonres_flag based on classa_init and classa_prop, it does not make sense for us to turn around and use these fields to alter classa_init and classa_prop. Should we still be automatically setting these cases to zero?

No. Issue #69 states the only remaining reasons for changing units_init and unit_prop. These two snippets of code should no longer be used.

Are they accounted for in the QAQC flag b_likely_occ_desc (see #106 for definition of this check)?

Yes, this is one intent of the QAQC flag b_likely_occ_desc.

levysamu commented 4 years ago

Responding to the review of the Inactive code, here is what we want it to look like:

Dependent on date_complete, job_status, date_lastupdt A job can only be inactive if date_complete is NULL A job is set to inactive for:

All jobs where the status is 9. Withdrawn

Set as Inactive if job_status is 
1. Filed Application OR 
2. Approved Application OR 
3. Permitted for Construction 

AND date_lastupdt > 3 years before vintage date (for example, greater than three years before June 30, 2020 for version 20Q2). 
Note that this applies to all job types, including demolitions.
Set as Inactive if job_status is: 

1. Filed Application OR 
2. Approved Application OR 
3. Permitted for Construction 

and matches with a job that has a status of either 4. Partially Completed Construction or 5. Completed Construction on the following fields: 

- classa_init (where units are > 0) AND 
- classa_prop (where units are > 0) AND 
- address AND 
- job_type 

AND the date_lastupdt of the incomplete job is before the date_lastupdt of the complete job. 
levysamu commented 4 years ago

Stories_prop should be null for demolitions and stories_init should be null for new buildings.

kschmidtDCP commented 4 years ago

Can you review the final logic for all fields and let us know if anything is inconsistent with your understanding?

@AmandaDoyle @mgraber @SPTKL Sam and I made some edits to the attribute mapping, here: https://nyco365.sharepoint.com/sites/NYCPLANNING/hed/_layouts/15/doc.aspx?sourcedoc={86a4bb4d-25b5-4183-8a28-97ffb626e0cf}&action=edit We're happy to discuss on a call if you like.

kschmidtDCP commented 4 years ago

dcpedited is currently formatted as an array. How would like dcpedited to be formatted?

@AmandaDoyle By array, does that mean it's a comma separated list of the fields that were edited? If so, that is what we would like.

levysamu commented 4 years ago

For the enlargment field, the logic is correct. We missed the distinction between = and <> the first time around, and thought they were all =. We do want you to change the mapped values to Horizontal and Vertical.

kschmidtDCP commented 4 years ago

Need feedback from HED on classa_incmpl.

We would not like to complicate the logic for classa_complete and classa_incmpl. Instead, here's what we propose for 20Q2:

  1. Delete the classa_complete and classa_incmpl fields.
  2. Create a new field called units_co located after otherb_prop that is populated with the number of units from the most recent certificate of occupancy.
  3. For discussion potentially for 20Q4: Adjust logic for QC check "units_co_prop_mismatch." _Output CO type if job_type is NB or A1 and units_co <> unitsprop. CO type should be “TCO” if the last CO was temporary, or “FCO” if the last CO was final.
AmandaDoyle commented 4 years ago

@kschmidtDCP there would be no cap on units_co correct?

kschmidtDCP commented 4 years ago

@kschmidtDCP there would be no cap on units_co correct?

That's correct. It's just a straight reporting of the CO units. Also note that we added a third point above for discussion.

AmandaDoyle commented 4 years ago

@kschmidtDCP great - thanks.
I opened the 3rd point as an enhancement for a future version

AmandaDoyle commented 4 years ago

Noting here: @mgraber @kschmidtDCP @levysamu Touched base with LS Census blocks will not align with council districts or community districts, so we should just look to use the council district or community district assigned to a record and not do a lookup from block to create the aggregate tables

Edit: Nevermind. PL sent a crosswalk this morning

levysamu commented 4 years ago

Noting here: @mgraber @kschmidtDCP @levysamu Touched base with LS Census blocks will not align with council districts or community districts, so we should just look to use the council district or community district assigned to a record and not do a lookup from block to create the aggregate tables

Just a flag that Pop responded with a lookup table between blocks and PUMAs, Community Districts, and Council Districts. You were copied on the email @AmandaDoyle.

AmandaDoyle commented 4 years ago

@levysamu yep - I edited my comment and shared the crosswalk.

levysamu commented 4 years ago
  1. I am uncomfortable with the idea of converting NULLs to 0s. I sent an email just now asking Hannah if there is room to change the symbology to allow for the inclusion of NULLs without turning them into 0s.
  2. I am struggling to imagine a scenario where a withdrawn job has received a certificate of occupancy. Is the scenario you are describing for demolitions, A1s, or NBs? Can you send a few examples over so I can review them to make sure we are making the correct decision?
  3. Do you mean add the borough code to the community district to make it a UID (BX01 and BK01, vs 1 for both Brooklyn and the Bronx)? If so, yes. If not, I'm not sure what you mean.
  4. I think this depends entirely on the extent of the problem and how long it would take you to handle it. As a general rule, I think I would rather get the database sooner so that we can start our research. Could we do that work in parallel?
AmandaDoyle commented 4 years ago

@levysamu made the decision to leave units net as NULL for displaying in the capital planning platform. As a result these records will not appear on the map in the capital planning platform

github-actions[bot] commented 4 years ago

Stale issue message

github-actions[bot] commented 4 years ago

Stale issue message