Closed AmandaDoyle closed 4 years ago
Is the new field nonres_flag simply the opposite of resid_flag? If so, what is the value of including both? Related to #27
These are not mutually exclusive, so if a building contains both residential and nonresidential uses (like retail) both flags would be applied.
@AmandaDoyle @mgraber @SPTKL
Related to the previous question. Previously, we set classa_init to 0 if: job_type='Alteration'
AND occ_initial ~ 'hotel'
AND occ_proposed ~ 'Residential|Assisted'
AND x_mixeduse is null
and classa_prop to 0 if:job_type='Alteration'
AND occ_initial ~ 'hotel'
AND occ_proposed ~ 'Residential|Assisted'
AND x_mixeduse is nullWe are now using resid_flag and nonres_flag in place of x_mixeduse. If we set resid_flag and nonres_flag based on classa_init and classa_prop, it does not make sense for us to turn around and use these fields to alter classa_init and classa_prop. Should we still be automatically setting these cases to zero?
No. Issue #69 states the only remaining reasons for changing units_init and unit_prop. These two snippets of code should no longer be used.
Are they accounted for in the QAQC flag b_likely_occ_desc (see #106 for definition of this check)?
Yes, this is one intent of the QAQC flag b_likely_occ_desc
.
Responding to the review of the Inactive code, here is what we want it to look like:
Dependent on date_complete, job_status, date_lastupdt A job can only be inactive if date_complete is NULL A job is set to inactive for:
All jobs where the status is 9. Withdrawn
Set as Inactive if job_status is
1. Filed Application OR
2. Approved Application OR
3. Permitted for Construction
AND date_lastupdt > 3 years before vintage date (for example, greater than three years before June 30, 2020 for version 20Q2).
Note that this applies to all job types, including demolitions.
Set as Inactive if job_status is:
1. Filed Application OR
2. Approved Application OR
3. Permitted for Construction
and matches with a job that has a status of either 4. Partially Completed Construction or 5. Completed Construction on the following fields:
- classa_init (where units are > 0) AND
- classa_prop (where units are > 0) AND
- address AND
- job_type
AND the date_lastupdt of the incomplete job is before the date_lastupdt of the complete job.
Stories_prop should be null for demolitions and stories_init should be null for new buildings.
Can you review the final logic for all fields and let us know if anything is inconsistent with your understanding?
@AmandaDoyle @mgraber @SPTKL Sam and I made some edits to the attribute mapping, here: https://nyco365.sharepoint.com/sites/NYCPLANNING/hed/_layouts/15/doc.aspx?sourcedoc={86a4bb4d-25b5-4183-8a28-97ffb626e0cf}&action=edit We're happy to discuss on a call if you like.
dcpedited is currently formatted as an array. How would like dcpedited to be formatted?
@AmandaDoyle By array, does that mean it's a comma separated list of the fields that were edited? If so, that is what we would like.
For the enlargment field, the logic is correct. We missed the distinction between = and <> the first time around, and thought they were all =. We do want you to change the mapped values to Horizontal and Vertical.
Need feedback from HED on classa_incmpl.
We would not like to complicate the logic for classa_complete and classa_incmpl. Instead, here's what we propose for 20Q2:
@kschmidtDCP there would be no cap on units_co
correct?
@kschmidtDCP there would be no cap on
units_co
correct?
That's correct. It's just a straight reporting of the CO units. Also note that we added a third point above for discussion.
@kschmidtDCP great - thanks.
I opened the 3rd point as an enhancement for a future version
Noting here: @mgraber @kschmidtDCP @levysamu Touched base with LS Census blocks will not align with council districts or community districts, so we should just look to use the council district or community district assigned to a record and not do a lookup from block to create the aggregate tables
Edit: Nevermind. PL sent a crosswalk this morning
Noting here: @mgraber @kschmidtDCP @levysamu Touched base with LS Census blocks will not align with council districts or community districts, so we should just look to use the council district or community district assigned to a record and not do a lookup from block to create the aggregate tables
Just a flag that Pop responded with a lookup table between blocks and PUMAs, Community Districts, and Council Districts. You were copied on the email @AmandaDoyle.
@levysamu yep - I edited my comment and shared the crosswalk.
@levysamu made the decision to leave units net as NULL for displaying in the capital planning platform. As a result these records will not appear on the map in the capital planning platform
Stale issue message
Stale issue message
Questions 2020/07/06 - 2020/07/10
[ ] Updated DOE subdistricts shapefile? Current shapefile has gaps between polygons. e.g.
[x] For displaying data in the capital planning platform should we update units_net from NULL to 0?
[x] In cases where a job has a non-null date complete and is Withdrawn, should it be marked as inactive? Right now if date complete is not null a job cannot be inactive. Or should jobs not be marked as inactive if date complete is not null? The current logic for setting a job as inactive is here.
[x] Should we remove boro from aggregate_commntydst so that community district is a unique ID? Clarification: These community districts cross borough boundaries: 207, 208, 302, 306, 402
[x] Sometimes values are not valid in owner_biznm, owner_address, owner_zipcode, and owner_phone fields. Would you like us to / is it worth the effort to so a light clean up of invalid values?
Archived Questions
2020/06/29 - 2020/07/02
- [x] For units_co_prop_mismatch do you really need us to map `C -CO` -> `FCO` and `T- TCO` -> `TCO`? We recommend just outputting `C -CO` and `T- TCO` so that certificate types are consistent across tables. #224 - [x] Can you confirm the [logic](https://github.com/NYCPlanning/db-developments/issues/203#issuecomment-651218061) for the b_nonres_with_units QAQC check? #203 - [x] Need feedback from HED on classa_incmpl - [x] Does each census block fall within a single community district and a single council district? If so, where can we find a lookup for this relationship? #213 - [x] After reviewing the logic for populating `enlargement` I think [the logic implemented](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping#enlargement) is correct compared to what you outlined in the document. Would you like the values that are mapped updated (i.e. Horizontal' instead of 'Enlrg_Horiz')?2020/06/22 - 2020/06/27
- [x] Given that `nonres_flag` and `resid_flag` are supposed to identify something as Mixed Use should the [logic](https://github.com/NYCPlanning/db-developments/blob/74aaf2b1df5f74b2b6383f8158b65312b180f698/developments_build/sql/_function.sql#L283) of only assigning `nonres_flag` if `_resid_flag` is NULL still be applied? #49 - [x] Can you sign off on the [logic](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping#job_inactive) of flagging something as inactive? - [x] Should the date we compare to to flag jobs as inactive be a static date (i.e. July 1, 2020) or the running date of when the script is run? #164 - [x] Can you confirm that `classa_complt` should be the `numofdwellingunits` associated with the latest final CO if there is one or the latest T-CO? #177 - [x] Should `stories_prop` only be populated for NB and A1 and NULL for DM? - [x] Can you review the [final logic](https://github.com/NYCPlanning/db-developments/wiki/20Q2-Attribute-mapping) for all fields and let us know if anything is inconsistent with your understanding? - [x] `dcpedited` is currently formatted as an array. How would like `dcpedited` to be formatted? - [x] Can you confirm that the final outputs in issue #186 are correct and comprehensive? - [x] Can you clarify the logic for matching on units for flagging jobs as inactive [here](https://github.com/NYCPlanning/db-developments/issues/195#issuecomment-650167394) #195 ? - [ ] ~~What file from the M drive should we use to get `depdrainarea` and `deppumpstatn`? #52~~ Did not receive files in time. - [x] Can we get confirmation about the logic to assign status codes from dates? #160 - [x] **Addressed in Teams** How do we assign status values? Using DOB status field plus dates depending on the job type, or just dates. #1502020/06/15 - 2020/06/19
- [x] Describe the Lon Lat Corrections process and rationale #134 - [x] When finding potential duplicates, do we treat two records that both have NULL units as an "equal units" match? What about two records that share an address but both have NULL geo_bbl? ( #106 ) - [x] Can you sign off on the New building and demolition overlap QAQC table schema under "Supplemental QAQC tables" in #106? - [x] Also in #106 We have a new idea for how to flag duplicates that we'd like to propose, given the request that we present potential duplicates as groups/clusters rather than pairwise comparisons. Rather than having one column for equal unit matches and another for different unit matches, we would like to propose: 1) duplicates with equal units and 2) duplicates regardless of units For example, imagine three records sharing a bbl and address, 2 have equal units and one has different units. With the proposed method you'd have two columns containing the shared bbl & address (a stable alternative to a cluster id). Duplicates with equal units would be grouping together the two records, and duplicates regardless of units would be grouping together the three records. Here is another example. In this case, each of these job_numbers has a match with equal units, as well as two matches with different units. Sorting by the `dup_bbl_address` column allows you see groups of potential duplicates, regardless of units. If you instead only want to focus your attention on potential duplicates that share units, you would group by `dup_bbl_address_units` - [x] Based on your email the files in #58 are not the correct files correct,and we're waiting for updated data? - [x] What should the qaqc field be for `dem_nb_overlap` (temporary name)? It indicates spatial overlap between new buildings and demolitions, as requested in #57 (PR #124). Answer: dem_nb_overlap - [x] Should we include CO fields to power app? if so what to name them? #107 - [x] How should we assign the `geom` for records that have address in `TPAD`? Should the Lat/Long come from GeoSupport via the address or should it come by joining the DOB BIN or BBL to building footprints or MapPLUTO? The rest of the spatial data attributes are coming from spatial joins. #63 #74 #79 GRU says that if the record is in TPAD, get the point geometry by taking the centroid from the clipped MapPLUTO BBL - [x] Did you follow-up with DOB about zoning sqft data? #58 - [x] Are there particular field names you would like for each boolean flag in the QAQC checks? #106 - [x] Does "Make status Complete for demolition if it has a value in Date_StatusX" #40 overwrite previous logic in assigning `5. complete` for demolitions (or do we add ^ in addition to previous logic) ? preivous logic: ```sql WHEN a.job_type = 'Demolition' AND status_translate(a._status) IN ('5. Complete','3. Permited') THEN '5. Complete' ``` - [x] Final sign-off on schema as listed in #52, which we've updated to be in sync with the spreadsheet and to reflect that capitalization changes cannot be applied - [x] Is `Nonresid_flag` an intuitive name for describing a building with both residential and nonresidential functions? So a mixed used building would be a building that has both a true for `resid_flag` and `Nonresid_flag`? So `Nonresid_flag` isn't really replacing mixed use, it's the function of the two fields that are replacing the mixed use flag? #52 - [x] One of the QAQC checks is for BISTEST. We already have [code](https://github.com/NYCPlanning/db-developments/blob/637f7391a457037223f5ba48b17dab36202442ee/developments_build/sql/init.sql#L138-L148) to put these into the research table these records programatically by dumping them as 'remove' in the corrections file. Is the QAQC check to find BISTEST records in the raw data (before we remove them) or to find any records that got by our automatic removal? Should we not be taking these records out automatically? #106 - [x] Similarly, we remove duplicates programatically. Should we not automatically remove them? If we keep the [removal step](https://github.com/NYCPlanning/db-developments/blob/637f7391a457037223f5ba48b17dab36202442ee/developments_build/sql/init.sql#L107-L129), are the duplicate QAQC checks to be done before or after removal? #31 #106 - [ ] ~~Do you have the logic on how to add fields from Furman Center Core dataset? #48 It says HED input was given, but I don't see it.~~ (Not executing in this enhancement period) - [x] Furman field names are inconsistent between schema spreadsheet and issue #48. Which do we use? We should clarify this in the schema issue. #52 - [ ] ~~Do you have the logic on how to add fields from DHCR dataset? #47 It says HED input was given, but I don't see it.~~ (Not executing in this enhancement period) - [x] Previously we used occupancy initial and proposed fields to assign the resid_flag value; however, based on new logic you only want to assign something as residential if the record has a value in one of the 6 unit fields. Does this make sense? How will non residential units be captured or removed? - [x] Related to the previous question. Previously, we set `classa_init` to 0 if: ```sql job_type='Alteration' AND occ_initial ~* 'hotel' AND occ_proposed ~* 'Residential|Assisted' AND x_mixeduse is null ``` and `classa_prop` to 0 if: ```sql job_type='Alteration' AND occ_initial ~* 'hotel' AND occ_proposed ~* 'Residential|Assisted' AND x_mixeduse is null ``` We are now using `resid_flag` and `nonres_flag` in place of `x_mixeduse`. If we set `resid_flag` and `nonres_flag` based on `classa_init` and `classa_prop`, it does not make sense for us to turn around and use these fields to alter `classa_init` and `classa_prop`. Should we still be automatically setting these cases to zero? Are they accounted for in the QAQC flag `b_likely_occ_desc` (see #106 for definition of this check)?2020/06/08 - 2020/06/12
## Field names & schema changes - [x] For field, names should we use the 10 char max field name for the files outputted by data engineering? Please note that we can only output lowercase letters. Related to #52 - [x] Where does the 13 char max field name come into play? - [x] What values should the fields `nonres_flag` and `resid_flag` contain? 'Y' and NULL? Related to #27 - [x] Is the new field `nonres_flag` simply the opposite of `resid_flag`? If so, what is the value of including both? Related to #27 - [x] How is `nonres_flag` set? See [thread in #27](https://github.com/NYCPlanning/db-developments/issues/27#issuecomment-643348226). - [x] need more clarity on **Recode City Owned, Owner Type, Non-Profit into a single field** #64 (need a look up table, or clear logic) ## Corrections and QAQC - [x] #53 How do we correct invalid dates, and who needs to be involved in this conversation? Related to - [x] #26 Do we still have to output jobs where occ_category = Other and units_init/prop <> 0 for research given how we are recoding occ_category? ## Units - [x] What are valid reasons for programmatically changing unit fields as in #69? ## HNY - [x] #105 Where should edits to hny fields occur? - [x] Is there consensus on what is proposed for the HNY matching workflow outlined in #83 ? ## Other - [x] #57 What date parameters should we use for matching NB to DMs? - [x] Check to confirm confirm that we should NULL out any job descriptions that do have any letter characters in #46 - [x] In the new deduplication logic (#31), we filter to jobs that are not inactive. Do _both_ of the potential duplicate jobs need to have `x_inactive IS NULL` to be output to the QAQC table, or only one job?