Open AmandaDoyle opened 4 months ago
@AmandaDoyle @jackrosacker and I just reviewed this issue. Our comments and questions are below. In general we still need to review the data sources which we intend to start this week.
Zoning districts (Zoning Values within PLUTO) • Is one zoning district range being assigned to a single column or are we checking off if those zoning ranges apply to the project in three Boolean columns? • Is there a hierarchy for the different zoning district ranges: R1 through R4, R5 through R10, C or M? (i.e. if a BBL falls into multiple categories is one range assigned over the others)
Air quality – GIS Team needs to vet the sources
Arterial highways and vent structures – GIS Team needs to review the sources • Arterial highways: if a project is located next to multiple features, do we need to list the name of all or just one? Closest? • Arterial highway source – is this just the arterial highways in the DCM_ArterialsMajorStreets open dataset • Vent tower: is this a distinct question or part of the arterial highway question? Do we need to list the name of the vent?
Elevated subway or railway • if a project is located next to multiple features, do we need to list the name of all or just one? Closest? • Source: GIS Team to compare LION vs data received from PS
Airport – GIS Team needs to vet the source • Confirm if EWR is excluded
Natural Resources and Shadows – GIS Team needs to review the sources (who determines what source is valid?) • if a project is located next to multiple features, do we need to list the name of all or just one? Closest? • Noticed a state and federal wetland dataset – should we use only one? How do we handle conflicts? • Beaches: need to identify a source
Historic and Shadows – GIS Team needs to review the sources • if a project contains or is located next to multiple features, do we need to list the name of all or just one? Closest?
Open Space/Shadows – GIS Team needs to review the sources • Should we include federal park properties?
E-designations • We need to create a field for each type of e-designation (noise, air quality, hazmat)? • For lots with multiple e-designations, should they be concatenated into one field? • Should we include restrictive declarations (i.e. e-numbers that start with “R”) • Source: e-designation table (csv)
Just wanted to clarify - who is point person (on our end) for questions for PS?
("our end" meaning GDE, not DE)
@croswell81 and @jackrosacker please see my answers below
Zoning districts (Zoning Values within PLUTO) • Is one zoning district range being assigned to a single column or are we checking off if those zoning ranges apply to the project in three Boolean columns? • Is there a hierarchy for the different zoning district ranges: R1 through R4, R5 through R10, C or M? (i.e. if a BBL falls into multiple categories is one range assigned over the others)
If it's possible to calculate the Zoning District for the CEQR II form using the existing 4 zoning district fields in PLUTO that may be preferable. To be considered for the R5 through R10 Residential zoning district the 4 zoning district fields must only include R5 through R10 values and be absent of any other zoning district. Whereas to be considered as an R1 through R4, an R1 through R4 value must simply appear in 1 of the 4 zoning district fields. A BBL is assigned a zoning district if 10% or more of the bbl's area is covered by that zoning district. Zoning district values are assigned from most covered to least coverage, with Zoning District 1 covering the greatest area and Zoning District 4 covering the least area. For CEQR II we just need to know if a project is R5 through R10 or R1 through R4.
Arterial highways and vent structures – GIS Team needs to review the sources • Arterial highways: if a project is located next to multiple features, do we need to list the name of all or just one? Closest?
We just need to know if a project is near an arterial highways or vent structure, or not. This would be a boolean value; we do not need to report any name.
• Arterial highway source – is this just the arterial highways in the DCM_ArterialsMajorStreets open dataset
That is a question for Andrew E.
• Vent tower: is this a distinct question or part of the arterial highway question? Do we need to list the name of the vent? You do not need to list the name of the vent. Check with Planning Support if vents need to be reported separately from highways.
Elevated subway or railway • if a project is located next to multiple features, do we need to list the name of all or just one? Closest? • Source: GIS Team to compare LION vs data received from PS
We just need to know if a project is near an elevated subway or rail, or not. This would be a boolean value; we do not need to report any name.
Natural Resources and Shadows – GIS Team needs to review the sources (who determines what source is valid?) • if a project is located next to multiple features, do we need to list the name of all or just one? Closest? • Noticed a state and federal wetland dataset – should we use only one? How do we handle conflicts? • Beaches: need to identify a source
Like the above, we just need to know if a project is near a natural resource and not what it is. Given that we just need to know if something is near something we don't need to worry about conflicts. I proposed GIS team recommend what the data source should be used.
Historic and Shadows – GIS Team needs to review the sources • if a project contains or is located next to multiple features, do we need to list the name of all or just one? Closest?
Same as above.
Open Space/Shadows – GIS Team needs to review the sources • Should we include federal park properties?
I'd ask PS.
E-designations • We need to create a field for each type of e-designation (noise, air quality, hazmat)? • For lots with multiple e-designations, should they be concatenated into one field? • Should we include restrictive declarations (i.e. e-numbers that start with “R”) • Source: e-designation table (csv)
We need to work through e designations with PS and EARD.
Regarding GDE POC to PS and EARD - I'd prefer to not play telephone, so whoever is doing the work should reach out directly, but loop me in on all communications and come to me first if there are any questions you have that you want to discuss internally before reaching out to PS and EARD.
DE and GIS will prioritize 3 categories for a demo on Friday 2/23: Zoning, Air, Noise
GIS needs a version of the new data to load into the survey map
Jack has been using a dummy dataset to prototype the map
I am cleaning up the CEQR_Type_II_Data_Source_Review.xlsx doc. I will let you know when all edits are completed.
We've imported all source data and confirmed output specifications for the POC
Now working on build logic. To help keep GIS unblocked, we'll export mock data by tomorrow morning.
Noting a couple of data items that popped up last week that I don't want to forget:
Since Arterial Highway is such a specific dataset, I would recommend keeping DCM arterial centerline data and doubling the buffer to 150 feet. We should confirm with Planning Support and EARD but it would be a much easier task than trying to recreate the arterial highway list from another source. @jackrosacker
Could we use the centerline to select associated planimetric features within a set distance, and then base the 75' buffers off of those selected features? (Damon's idea from last week I think)
It would take someone going in and cleaning up all other roadways that are within that 75 buffer. The planimetric data doesn't have attribute data to narrow down what could be selected.
Also, we should remember this is a flag, not a precise calculation. Using the Arterial Highway is the most accurate dataset, that allows us to use an existing public resource. Adjusting the buffer is the better way to go.
The planimetrics dataset is derived from aerial images and is updated every 2-4 years, not as reliable.
Okie doke. We can run the increased buffer idea by folks in the pm meeting today
@croswell81 @jackrosacker
During one of our chats with Planning Support, there was mention of an Air Quality check I don't think we're doing yet: "are you within 400 feet of a manufacturing land use?"
It seems like the logic for this flag would be something like:
And the source data to pass along for the map would be "all manufacturing lots"
Does that sound right?
That sounds tentatively correct to me, pending any changes from Matt.I added a placeholder question for this in the survey as well, until we have final language. I forwarded you the email from Alex with the initial question, and placing the email text here:
Hi ITD Team,
We had a discussion with Stephanie today that included the desire for an addition to the air quality section of the tool. We currently do not have any logic to flag the potential for an unpermitted air quality source, and it is just in a note necessitating a manual check. We are hoping it may be possible to query within a 400 foot buffer from the site if there is any lot with a manufacturing land use (using PLUTO). If there is, we would flag that they need to check this study area for unpermitted industrial sources.
The question would be something like: Are there any manufacturing or processing facilities operating within 400 feet of the Development Site that may be unpermitted sources of air quality emissions?
We are happy to use some of our time next week to discuss.
Definitely an item to work on further with them.
@damonmcc @jackrosacker They are looking to identify uses, not zoning, so the query or filter should use Land Use code = 06.
We should confirm with Planning Support and EARD there are no other Land Uses or building codes that allow "processing facilities" besides '06'.
@damonmcc, Alex shared this pilot area with me: 3285 Fulton Street BBLs available in the ZAP link
And a second pilot area for BBL 4122770001. I'll see if we can add you to the PPT, I'm not sure how much of the draft doc info is ok to live here.
@damonmcc et al, I spotted a couple of possible issues with the zoning classifications within the green fast track bbl dataset. Item 1: Lots with some R1-R4 present are getting classified as R5-R10 I this example, lot 1 (red underline) is classified as R1-R4, while lot 70 (blue underline) is classified as R5-R10. I think both should be R1-R4, since any part of the lot in both cases is within that range. I could be wrong about this, so let's discuss before you start re-engineering anything.
Item 2: Lots with no zoning category at all See BBL 4124950002
I haven't done any form of comprehensive error checking yet, just flagging these as they pop up
From my understanding, we (AD and PS) agreed on "pluto classification" as truth for whether R1-R4 (or containing any specific district). That threshold is 10% of the district.
@jackrosacker @fvankrieken beat me to it. We don't assign zoning to PLUTO lots if it less than 10% and that will apply to this app and data as well.
I'll look into 4124950002
Ok, I'd lost track of the fact that we're following the 10% rule for GFT as well. I'll add that to the zoning aggregation description in my presentation.
@fvankrieken - seeing ~10k lots without any zoning classification in what I believe is the latest version of the dataset. There might be a legitimate reason for some of these that's eluding me
Current implementation is based on the original zoning logic, which said one of
4124950002 is R6 and C4, meaning it fits none of those three categories. I think you and @damonmcc had come up with a revised little decision tree, which will need to be implemented.
Though still not quite sure what this should be "flagged" as? For this specific case of R6 and C4?
Pretty sure this one would be a C or M lot. My understanding of the language is that it only requires the presence of C or M, not necessarily "wholly."
This is probably a post-demo discussion. Sounds like we should discuss and refine the zoning decision tree, and choose a non-null text value to indicate lots that are in none of the buckets above ("Other", "Ineligible", etc.)
As I'm writing this I'm wondering if we also need to account differently for lots that are a mixture of R1-R4 and C or M.
After the demo, maybe GIS and DE can get together again to review the logic of each field, including updating buffer distances.
Logging some thoughts here for how to account for lots with both R1-R5
and C or M
present. Each option represents a hypothetical project site in which four lots are selected, each with the following zoning values.
Table has a single zoning category column, with four possible options per BBL. BBL | Zoning Category |
---|---|
1111111111 | R1-R4 |
2222222222 | R5-R10 |
3333333333 | C or M |
4444444444 | R1-R4 with C or M |
Table has two zoning columns, resi and c or m. There are still four possible zoning options per BBL, but pulled from values combined across both columns. This example uses the actual value within the C or M column, not just a Yes/No option.
BBL | Residential Zoning | C or M Zoning |
---|---|---|
1111111111 | R1-R4 | |
2222222222 | R1-R4 | C or M |
3333333333 | R5-R10 | |
4444444444 | C or M |
Table has two zoning columns, resi and c or m. There are still four possible zoning options per BBL, but pulled from values combined across both columns. This example uses a Yes/No value within the C or M column, not the actual value.
BBL | Residential Zoning | C or M Zoning |
---|---|---|
1111111111 | R1-R4 | No |
2222222222 | R1-R4 | Yes |
3333333333 | R5-R10 | No |
4444444444 | Yes |
For our upcoming GFT data meeting @croswell81 @damonmcc @fvankrieken (since I think you're tackling the zoning stuff?)
@jackrosacker should Option 2 have more specific values in the C or M Zoning
column? it says "This example uses the actual value within the C or M column, not just a Yes/No option."
notes from DE & GIS chat on 4/2
@jackrosacker @croswell81
from our chat about Lot Zoning vs Project Zoning, I tried to illustrate the logic we'd implement in order to do Option 1 above (one Zoning column). I'll put the diagram in this comment and this link to the PR it's coming from
@damonmcc @jackrosacker This looks correct to me.
Agreed, with one amendment: Any R5 - R10?
should be Entirely R5 - R10?
@damonmcc @croswell81
thanks @jackrosacker! I think I see what you mean about changing to Entirely
, so I revised it to be the diagram below.
I wanted to still use Any
because that frame of mind translates really well to the logic/code we'll have to write. Hope this captures it!
Cool, yeah that seems to cover more bases. I wonder if there's ever a circumstance in which there's a third branching option from the Any R5-R10
in which the lot is partially Other
, meaning that the lot could not be classified as "Entirely R5 - R10". Or other possible edge cases? I'll leave subsequent comments in the #741 issue
@damonmcc
@damonmcc I wanted to also suggest that we find and record a BBL for each possible zoning combination to use as benchmarks. If that's easy for DE to gather while writing the queries, could you add a list of BBLs to this issue? I'm happy to poke around and find them as well if helpful.
@damonmcc
Notes from Damon <> Jack on 2024-04-09:
source_data_versions
in data GDB)@damonmcc I've been able to successfully publish the source_data_versions
table to ArcGIS Online, and pull the dataset name and vers/date values into the survey as static values. This will enable us to print out the data versions at the bottom of each report PDF for easy reference. A few things occurred to me while implementing this:
This doesn't have to happen before the Beta.
@jackrosacker
Are the column headers finalized for this table? The naming convention doesn't matter at all, but the report will break if we change convention later
The column headers are finalized. This table is generated by code that we use in all builds to load source data.
Noticing variability in the date format in the 'v' column. I personally prefer the 1900-01-01 format. Requesting that we normalize to one format
Definitely possible. Since these are versions of source data, DE kind of treats these more like strings than dates. So changing the format sort of breaks the 100% certainty that we'll find that exact value in edm-recipes
, but no worries! we're talking about formatting to display and DE can always retrieve the pre-formatted value if we need to.
Are we confident that each of the version/date values are getting updated when each build happens? I don't see anything to indicate otherwise, but want to be sure these values are accurate before printing into the report
Yup this file green_fast_track/recipe.yml
ensures that every build uses a particular version (in this case the latest version) which is then documented in the source data version table.
Do the dates indicate a specific thing? i.e. date the data was ingested for external, date the data was built for internal, etc.
This varies by dataset, specifically the source of the dataset. If the created data is programmatically available, the version is the created date. If it isn't available, the version is the ingested date.
Everything we ingest from NYC Open Data or an ArcGIS feature server uses the "last updated" value as the version.
We'd like to make this more clear for all source data and perhaps could reflect the details by one or more new columns in this table, but don't plan to do that soon. We may be able to describe what the version means for each dataset in DE's GFT documentation though!
@damonmcc thanks for this. Glad to hear that the data is reliably up to date, and that the field names aren't expected to change. Based on what you're saying, I think we should:
from @jackrosacker on 5/1
Priority 1
Priority 2
Priority 3
@fvankrieken and @alexrichey, following up on our conversations this week with a punch list of Fast Tracker items. I've tried to be as comprehensive as possible, but there are a few items such as dataset aggregation and column/dataset naming review that will probably benefit from a closer look together before DE puts in too much work.
Also - this list alludes to but does not fully cover the remaining datasets to be added. These fall into two buckets: (1) datasets that have been cleared by Matt/Planning Support and handed off to DE but haven't been processed yet, and (2) datasets that are still being digitized/approved.
I'm out of office this afternoon but could talk through and help prioritize the below list tomorrow if either/both of you are available.
Update green_fast_track_bbl dataset:
Finish adding per-question flag columns
Existing, in 4/25 GDB:
State_Regulated_Freshwater_Wetlands___Checkzone_Flag
State_Archaeological_Areas_Flag
Natural_Resource_Shadow_Flag
Needed, calculated by GIS Team from 4/25 GDB:
Natural_Resource_Flag
Historic_Districts_Flag
Historic_Resource_Flag
Historic_Resource_Adjacent_Flag
Open_Space_Shadow_Flag
Historic_Resource_Shadow_Flag
Add Natural Heritage Communities name/id column
(optional) Ensure that any outstanding datasets have a corresponding column created, even if all values are null)
Edit field names for consistency:
Take note of any fields that keep the name but change meaning. See Damon's example re Historic Districts: "And there's one flag that will be renamed (and not show in the tool anyway) as to not conflict with a question flag: Historic Districts -> City Historic Districts."
cc @croswell81
@damonmcc @croswell81 I added data to our running GFT Data Sources sheet.
green_fast_track_bbl
field names, based on my aliases and your existing field names (my alias uses "elevated" because that's in the survey, but DE switched to "exposed" which I like)Take a look when you get a chance and we can regroup as needed.
@damonmcc @jackrosacker Latest data updates (refer to bullet 2 in Jack's comment from 5/8 above). All data updates are reflected in the CEQR Type II Data Source Review doc.
Updates:
Pending:
Noting to @damonmcc and @croswell81 that as I understand our design, the output CSV will have a single name/id column per dataset, meaning that e.g. if a lot both intersects with a tidal wetland (Nat Res question) and is also within 200ft of that tidal wetland (Shadow question), the name/id of that feature will appear once(?) in the single name/id column for that dataset, but will not differentiate in the export which question or flag it is associated to.
I think this makes sense to some degree, but had been vaguely assuming that the tidal wetland id would appear under a column for the natural resources intersection and another for the shadow buffer.
Does this line up with your understanding? Or am I missing something and we're planning to reflect that same tidal wetland ID in two columns, one per question/section?
@jackrosacker I was thinking that any resource that triggers a buffer that intersects a project lot would only be in one column (i.e. historic dist (contains, within 90 ft, within 200 ft - shadow) but realize the buffers are different and therefore all resources will not apply to all buffers and questions.
I think we should send to Planning Support and see if they care before we have DE add a bunch of new columns to the export table. cc: @damonmcc
Started to plot this out in advance of emailing PS, and ran into a few other wrinkles. Let's take an example project like below:
Iteration 1 - Our current design would have a single CSV column per dataset, regardless of how many times that dataset relates to the project, or through which question/spatial relationship:
BBL | NYC Hist Res ID | NYS Hist Res ID |
---|---|---|
3030530013 | F. J. Berlenbach House | |
3030530016 | F. J. Berlenbach House | |
3030530019 | F. J. Berlenbach House |
Iteration 2 - The alternative we discussed above, which ends up adding a column per question and per dataset, so that for each BBL you know the dataset, feature ID, and question/spatial relationship relevant: BBL | NYC Hist Res ID | NYC Hist Res ID - Adjacent | NYC Hist Res ID - Shadows | NYS Hist Res ID | NYS Hist Res ID - Adjacent | NYS Hist Res ID - Shadows |
---|---|---|---|---|---|---|
3030530013 | F. J. Berlenbach House | |||||
3030530016 | F. J. Berlenbach House | |||||
3030530019 | F. J. Berlenbach House, EXAMPLE RESOURCE FROM SAME SRC DATASET |
Iteration 3 - A third option, in which each question has corresponding data name/id field in the csv, and names/ids are grouped with a categorical prefix to indicate the relevant data source (imaginary datasets in all caps to demonstrate how multiple datasets would be aggregated into a single column): BBL | Hist Res ID | Hist Res ID - Adjacent | Hist Res ID - Shadows |
---|---|---|---|
3030530013 | NYC: F. J. Berlenbach House | ||
3030530016 | NYC: F. J. Berlenbach House, NYC: AN EXAMPLE RESOURCE, NYS: AN EXAMPLE RESOURCE | ||
3030530019 | NYC: F. J. Berlenbach House, NYC: AN EXAMPLE RESOURCE |
Iteration 1 is the most concise, but makes it harder for an applicant or EARD to review and application and understand how the BBLS, flag datasets, and questions interact with one another. Iteration 2 gives makes the review easier, but substantially increases the number of columns required. Iteration 3 is a combination of the other two, with the benefit of having fewer output fields but data that is harder to parse per CSV cell.
After exploring these directions, Iteration 1 (what we have now) feels the most viable. I don't currently have any other ideas for how to design this, do you two? @croswell81 @damonmcc
Edit: removed numeric values from field names, added an example of a concatenation of multiple features per question/bbl in iteration 2
@jackrosacker love having an example and those tables!
Iteration 1 is how we had it. While making the changes to have the flag-column-per-question structure of the final table, there's now a single id-column-per-question in that final table.
Iteration 3 is my favorite: one column of ID values per question. But I wonder if having column names like Hist Res - Adjacent
would be better than Hist Res ID - 90'
, so that people can easily relate the column to the question and you won't have to maintain buffer values in alias strings.
If EARD review requires "exactly which dataset did that value come from", Iteration 2 seems like something we can add later in addition to Iteration 3.
@jackrosacker @damonmcc The example is missing how many of these values would just be repeated, since any lot that has a historic resource will also be in the buffer, or any resource within 200 feet will also be within 90 feet.
My concern is this could potentially add dozens of fields since there are 10+ natural resource fields that go into NR shadow, and another 5-8 historic resources with two buffers, etc.
We should try to meet tomorrow when Jack is available.
Project Description
DE will extract source data, do geospatial calculations, and produce outputs needed for the new Green Fast Track survey GIS is making.
Timeline
Friday 2/23
For a demo of the POC, DE and GIS will focus on a subset of CEQR categories: Zoning, E-designation, Air, Noise.
Wednesday 3/20
Planning Support will host a forum with applicant reps for feedback on Eligibility Tool. The goal is to confirm it would work for their workflow.
Friday 4/19
Improve GFT tool by incorporating remaining variables and critical feedback.
June 3 public launch
after public launch
Background
The City Environmental Quality Review (CEQR) process identifies and assesses the potential environmental impacts of land use actions that are proposed by public or private applicants.
DCP plans to streamline housing construction by allowing potential applicants for land use actions to determine if their project is minor enough, known as CEQR Type II, and therefore exempt from environmental review. This is now known as the Green Fast Track Eligibility Tool or the GFT tool.
GIS is creating a survery for potential applicants to use and determine if their project is CEQR Type II .
The determination will be based on the project area. A project area is defined as all relevant tax lots. For each tax lot, all relevant CEQR considerations must be checked.
Potential geospatial logic
Zoning districts
CR%
value.CR - n
translates to Special Coastal Risk District, where n is the number of the district.Air quality Add three binary fields indicating if the tax lot intersects with any of the following buffers:
Arterial highways and vent structures
Vent tower data source BBLs: 1005950090, 1006800001, 1000180100, 1006560009, 1006650020, 1013530012, 3005040050, 4000130025 - Question for PS: How was the vent tower list created and how will it be maintained? Question for GIS: Is there a better data source?
Elevated subway or railway
Airport
Natural Resources and Shadows
Historic and Shadows
Open Space/Shadows
E-designations