add remaining Green Fast Track variables

damonmcc commented 2 months ago

see meeting notes here and source data spreadsheet here

extract new source data

[x] #674
[x] #672
[x] #673
[ ] Exposed Rail Yards
[ ] NYCDOB Tidal Wetland
[ ] NYCDOB Freshwater Wetland
[ ] NYCDOB Coastal Erosion Hazard Area

adding logic for new variables

[x] #732 (added by @sf-dcp on 4/3/24)
[x] Natural Resources
[x] Historic Resources
[ ] Exposed Rail Yards
[ ] NYCDOB Tidal Wetland
[ ] NYCDOB Freshwater Wetland
[ ] NYCDOB Coastal Erosion Hazard Area

exporting new source data

[ ] export all new source data in FGDB

GFT dataset details

The primary GFT table has a row for each PLUTO lot and columns to represent each "GFT variable". These variables are grouped into "CEQR categories".
For each variable, there's a single binary flag column and at least one value column.
When a variable flag is "Yes", the variable value columns show the source data values that contribute to the flag.
Each GFT variable has one source dataset.
Category -> Variables -> Source

more GFT dataset details

We have a CSV to describe all GFT variables here.
- It maps each variable to a category.
- Each variable has at least source.
- Category -> Variables
Other than combing through sql code, we don't yet have the Variable -> Sources mappings documented (similar to the source data excel file).

Potential approach to confirming existing variables

Ensure records in variables.csv cover all expected GFT variables by comparing it to the source data excel file)

Potential approach to adding a new/missing variable (builds/tests fail until the last step)

Add a record to the variables.csv
Create a dcpy/library/templates/ YAML file for the source (if needed)
Add the source to recipe.yml and _sources.yml (if needed)
Create a staging model by listing it in the properties file and adding a script (if needed)
Create a new intermediate model and test variable_id for null and unique
Add the new intermediate model to the list in int_buffers__all
Update and review the pilot project records in test_expected_pilot_projects.csv by copying the table the csv was compared to

damonmcc commented 2 months ago

(not vital, just a thought)

for geometries that are eventually checked for intersections with all lots during int__spatial_flags, it may be nice to distinguish between buffered and non-buffered polygons before that model uses them

they could all still end up in the same table, but that table is currently called int__all_buffers

damonmcc commented 2 months ago

after new Shadows data has been added to GFT, the DAG with the filter --select intermediate is below.

while adding logic to use the new data, I'll try to add a test we've talked about which would warn or error when new variables like nyc_parks_properties don't appear in the final table

damonmcc commented 2 months ago

for some reason during the ST_INTERSECTS part of int_flags__spatial, each lot is "intersecting" twice with polygons from int_buffers__nys_parks_properties

Screenshot 2024-04-02 at 5 19 47 PM

the buffered NY State Parks polygons seem ok though

fvankrieken commented 1 month ago

for some reason during the ST_INTERSECTS part of int_flags__spatial, each lot is "intersecting" twice with polygons from int_buffers__nys_parks_properties

the buffered NY State Parks polygons seem ok though

Was curious if there was an odd data issue so poked around. Duplicated line here!

Damon edit after offline chat: they aren't duplicated

damonmcc commented 1 month ago

looks like int_buffers__us_parks_properties is actually just stg__nys_parks_properties, so the union all in int_buffers_all has duplicates

Screenshot 2024-04-03 at 10 17 11 AM

@sf-dcp looks like we can drop the int_buffers__us_parks_properties model? I don't see any mention of US Parks in GIS's source data excel sheet

sf-dcp commented 1 month ago

looks like int_buffers__us_parks_properties is actually just stg__nys_parks_properties, so the union all in int_buffers_all has duplicates

@sf-dcp looks like we can drop the int_buffers__us_parks_properties model? I don't see any mention of US Parks in GIS's source data excel sheet

Wow that's a great catch! int_buffers__us_parks_properties doesn't use correct stage table. It should use stg__us_parks_properties instead. US Parks is listed as Federal Parks property in the GIS spreadsheet

sf-dcp commented 1 month ago

for some reason during the ST_INTERSECTS part of int_flags__spatial, each lot is "intersecting" twice with polygons from int_buffers__nys_parks_properties the buffered NY State Parks polygons seem ok though

Was curious if there was an odd data issue so poked around. Duplicated line here!

It doesn't seem to be duplicated as NYS and NYC properties are different

damonmcc commented 1 month ago

notes from DE & GIS chat on 4/2

Lot Zoning info

R1-R4 can exist with C or M
R5-R10 cannot exist with C or M
for current NULL values use one of these: Ineligible, Not Applicable, Other

Natural Resources

survey questions:
- does your porject site contain a natural resources?
- is your project site near a wetland check zone?
- Are you near another ... ? (not data-driven)
App Table vs Output CSV
- show single binary Natural Resources answer
- show all columns related to that answer

Historic Resources (Alex)

distinction b/w resource and district
- resource = point
- district = polygon
"does your project contain" (look for points)
"are you near" (buffer lots that contain points or buffer points)

Rail

GRU is gonna make changes for 24B

Rail Yards

GIS can use polygons (Complex Polygons)
not in CSCL because they're Complexes

Beaches

Non-DPR parks have no names

sf-dcp commented 1 month ago

Hi @croswell81 & @jackrosacker,

I've been working on processing steps for Shadows/Open Space data, and I have questions/concerns. Please see them below by variable name:

Federal Parks Property
- gnis_id column is Null for several (6) records that are not in NYC. So potentially in the future it may be Null for NYC resulting in Null values for variable_id
NYS Parks Property
- according to the instructions, county should be filtered for values: Bronx, Kings, New York, Queens, Richmond. The source data doesn't have records with New York. Also, it mentions not to include Manhattan - need to confirm
- uid column is Null for 164 records where 2 of them are within NYC. It results in Null values for variable_id as uid-name
NYC Parks Property
- This dataset needs to be filtered by typecategory and we initially had categories to exclude from an email thread. In the spreadsheet, it doesn't exclude Cemetery - do we now want cemeteries to be in the data?
WPAA
- According to the instructions, we are supposed to filter this dataset by Status. There is no status column in the source data.

If it's easier to meet to go over these questions, LMK!

croswell81 commented 1 month ago

@sf-dcp

Federal Parks Property: you should only use parks within NYC, so don't worry about the ones not in NYC. If there are future null values, we are still using the park name [PARKNAME] so the value will not be completely null.
NYS Parks Property: just cleared this up. There are no current records in Manhattan (aka New York county) but we should include it in the filter in case one is created in the future. If there are null values, we are still using the park name [Name] so the value will not be completely null.
NYC Parks Property: Good catch! Cemetery should be filtered out, I updated the data source doc.
WPAA: It looks like there is an alias field name for status as construction status. Let us know if you don't see either field.

I asked Planning Support if wanted to use Global ID for most of the open space datasets that have one, since it is the only unique id, and they were only concerned with name.

sf-dcp commented 1 month ago

Update as of 4/9/24: Shadows/Open Space logic has been implemented except 1 outstanding item (filter WPAA data by status). It appears that WPAA recipe currently uses incorrect Esri link to pull the data, and this is why the status column is absent.

TODO:

[x] update WPAA recipe with new esri link and archive new data
[x] update logic for WPAA table to filter by status

croswell81 commented 1 month ago

@sf-dcp the link to the wpaa rest feature service that will be updated is: https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/nywpaa/FeatureServer/0

croswell81 commented 1 month ago

@sf-dcp BYTES has also been updated with the correct link now.

damonmcc commented 1 month ago

from @croswell81

New Natural Resources dataset. Three check flag fields from DOB, I added each as a separate dataset in the CEQR Type II Data Source Review doc. They are NYCDOB Tidal Wetland, NYCDOB Freshwater Wetland, NYCDOB Coastal Erosion Hazard Area.

One table with bbl and flag (X or null) field for each variable. Join to PLUTO and create a lot based dataset for each of the three variables.

fvankrieken commented 2 days ago

Just adding todos in one place for myself, a bit redundant but just for convenience.

[x] archive nyc_beaches_20240509.zip
[x] archive railyards_hudsonyards_erase.zip with an informative name
[x] archive above 4 datasets
[x] add logic for railyards
[ ] add logic for remaining natural resources

damonmcc commented 2 days ago

@jackrosacker @caseysmithpgh just tagging yall here because you'll have to add aliases for 3 new rows in the source_data_versions table once this is done

fvankrieken commented 2 days ago

@jackrosacker - the data from DOB is just a wide table of flags per bbl. Do you still want this exported with the source data in some way? It's a bit different from say CATS permits, where we look up a bbl but actually use that geometry to create a buffer, rather than just using the source dataset to determine if there's a flag. This is more like E-Des in that way. So if it sounds good, I think it would just make sense to include these flags in the final table without exporting a source layer.

But of course, if it'd be useful to have a feature that's every lot that has these specific flags, we can easily add that. let me know

damonmcc commented 1 day ago

@fvankrieken

@jackrosacker - the data from DOB is just a wide table of flags per bbl. Do you still want this exported with the source data in some way? It's a bit different from say CATS permits, where we look up a bbl but actually use that geometry to create a buffer, rather than just using the source dataset to determine if there's a flag. This is more like E-Des in that way. So if it sounds good, I think it would just make sense to include these flags in the final table without exporting a source layer.

(after chats with @jackrosacker and @fvankrieken)

E-Des is a good comparison for these tabular (rather than spatial) variables. we won't export source data layers for tabular variables

and since the new Exposed Rail Yards will be an input to the existing Exposed Railway question/flag, this are the potential export layer impacts

if Exposed Rail Yards is polygons, add a new model/layer named source__exposed_railway_polys
if Exposed Rail Yards is lines, source__exposed_railway_lines will have more records
either way, the single record in source__exposed_railway_buffer will increase in area

fvankrieken commented 1 day ago

@damonmcc thanks for the clear write-up.

Railyards are polys. These two have the same flag_id_field_name then, should they have the same variable_type as well? I would lean with keeping them distinct here, but if you think it aligns better with other things to have them the same that's fine too.

Re the dob natural resource flags - these will just get wrapped up in the other natural resources, correct? And if so, what should their variable_type and variable_id be? Since there's one row per bbl in the dataset (so each bbl can only have one at most "NYCDOB Tidal Wetland" flag), maybe just have variable type and id be the same for legibility? "nycdob_tidal_wetland"? or something like that?

damonmcc commented 1 day ago

@fvankrieken

for Rail Yards; agree that a distinct variable_type makes more sense

for the DOB natural resource flags: agree the variable type and ID should be the same here (unlike E-Des where we get distinct IDs). but I think the variable ID should be more legible than the type (like Archeologic Areas)

jackrosacker commented 1 day ago

E-Des is a good comparison for these tabular (rather than spatial) variables. we won't export source data layers for tabular variables

In order to symbolize the DOB features on the map, will they be exported as part of the unioned Natural Resources dataset, or am I understanding that since these are per-lot attributes you won't be exporting any data and I should be displaying a view of the lots dataset with each DOB lot flagged?

NYCPlanning / data-engineering