Open damonmcc opened 2 months ago
(not vital, just a thought)
for geometries that are eventually checked for intersections with all lots during int__spatial_flags
, it may be nice to distinguish between buffered and non-buffered polygons before that model uses them
they could all still end up in the same table, but that table is currently called int__all_buffers
after new Shadows data has been added to GFT, the DAG with the filter --select intermediate
is below.
while adding logic to use the new data, I'll try to add a test we've talked about which would warn or error when new variables like nyc_parks_properties
don't appear in the final table
for some reason during the ST_INTERSECTS part of int_flags__spatial
, each lot is "intersecting" twice with polygons from int_buffers__nys_parks_properties
the buffered NY State Parks polygons seem ok though
for some reason during the ST_INTERSECTS part of
int_flags__spatial
, each lot is "intersecting" twice with polygons fromint_buffers__nys_parks_properties
the buffered NY State Parks polygons seem ok though
Was curious if there was an odd data issue so poked around. Duplicated line here!
Damon edit after offline chat: they aren't duplicated
looks like int_buffers__us_parks_properties
is actually just stg__nys_parks_properties
, so the union all in int_buffers_all
has duplicates
@sf-dcp looks like we can drop the int_buffers__us_parks_properties
model? I don't see any mention of US Parks in GIS's source data excel sheet
looks like
int_buffers__us_parks_properties
is actually juststg__nys_parks_properties
, so the union all inint_buffers_all
has duplicates@sf-dcp looks like we can drop the
int_buffers__us_parks_properties
model? I don't see any mention of US Parks in GIS's source data excel sheet
Wow that's a great catch! int_buffers__us_parks_properties
doesn't use correct stage table. It should use stg__us_parks_properties
instead. US Parks is listed as Federal Parks property in the GIS spreadsheet
for some reason during the ST_INTERSECTS part of
int_flags__spatial
, each lot is "intersecting" twice with polygons fromint_buffers__nys_parks_properties
the buffered NY State Parks polygons seem ok thoughWas curious if there was an odd data issue so poked around. Duplicated line here!
It doesn't seem to be duplicated as NYS and NYC properties are different
notes from DE & GIS chat on 4/2
Hi @croswell81 & @jackrosacker,
I've been working on processing steps for Shadows/Open Space data, and I have questions/concerns. Please see them below by variable name:
Federal Parks Property
gnis_id
column is Null for several (6) records that are not in NYC. So potentially in the future it may be Null for NYC resulting in Null values for variable_id
NYS Parks Property
Bronx
, Kings
, New York
, Queens
, Richmond
. The source data doesn't have records with New York. Also, it mentions not to include Manhattan
- need to confirmuid
column is Null for 164 records where 2 of them are within NYC. It results in Null values for variable_id
as uid-name
NYC Parks Property
typecategory
and we initially had categories to exclude from an email thread. In the spreadsheet, it doesn't exclude Cemetery
- do we now want cemeteries to be in the data?WPAA
Status
. There is no status column in the source data. If it's easier to meet to go over these questions, LMK!
@sf-dcp
I asked Planning Support if wanted to use Global ID for most of the open space datasets that have one, since it is the only unique id, and they were only concerned with name.
Update as of 4/9/24: Shadows/Open Space logic has been implemented except 1 outstanding item (filter WPAA data by status). It appears that WPAA recipe currently uses incorrect Esri link to pull the data, and this is why the status
column is absent.
TODO
:
@sf-dcp the link to the wpaa rest feature service that will be updated is: https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/nywpaa/FeatureServer/0
@sf-dcp BYTES has also been updated with the correct link now.
from @croswell81
New Natural Resources dataset. Three check flag fields from DOB, I added each as a separate dataset in the CEQR Type II Data Source Review doc. They are NYCDOB Tidal Wetland, NYCDOB Freshwater Wetland, NYCDOB Coastal Erosion Hazard Area.
One table with bbl and flag (X or null) field for each variable. Join to PLUTO and create a lot based dataset for each of the three variables.
Just adding todos in one place for myself, a bit redundant but just for convenience.
@jackrosacker @caseysmithpgh just tagging yall here because you'll have to add aliases for 3 new rows in the source_data_versions
table once this is done
@jackrosacker - the data from DOB is just a wide table of flags per bbl. Do you still want this exported with the source data in some way? It's a bit different from say CATS permits, where we look up a bbl but actually use that geometry to create a buffer, rather than just using the source dataset to determine if there's a flag. This is more like E-Des in that way. So if it sounds good, I think it would just make sense to include these flags in the final table without exporting a source layer.
But of course, if it'd be useful to have a feature that's every lot that has these specific flags, we can easily add that. let me know
@fvankrieken
@jackrosacker - the data from DOB is just a wide table of flags per bbl. Do you still want this exported with the source data in some way? It's a bit different from say CATS permits, where we look up a bbl but actually use that geometry to create a buffer, rather than just using the source dataset to determine if there's a flag. This is more like E-Des in that way. So if it sounds good, I think it would just make sense to include these flags in the final table without exporting a source layer.
(after chats with @jackrosacker and @fvankrieken)
E-Des is a good comparison for these tabular (rather than spatial) variables. we won't export source data layers for tabular variables
and since the new Exposed Rail Yards will be an input to the existing Exposed Railway question/flag, this are the potential export layer impacts
source__exposed_railway_polys
source__exposed_railway_lines
will have more recordssource__exposed_railway_buffer
will increase in area@damonmcc thanks for the clear write-up.
Railyards are polys. These two have the same flag_id_field_name
then, should they have the same variable_type
as well? I would lean with keeping them distinct here, but if you think it aligns better with other things to have them the same that's fine too.
Re the dob natural resource flags - these will just get wrapped up in the other natural resources, correct? And if so, what should their variable_type and variable_id be? Since there's one row per bbl in the dataset (so each bbl can only have one at most "NYCDOB Tidal Wetland" flag), maybe just have variable type and id be the same for legibility? "nycdob_tidal_wetland"? or something like that?
@fvankrieken
for Rail Yards; agree that a distinct variable_type
makes more sense
for the DOB natural resource flags: agree the variable type and ID should be the same here (unlike E-Des where we get distinct IDs). but I think the variable ID should be more legible than the type (like Archeologic Areas
)
E-Des is a good comparison for these tabular (rather than spatial) variables. we won't export source data layers for tabular variables
In order to symbolize the DOB features on the map, will they be exported as part of the unioned Natural Resources dataset, or am I understanding that since these are per-lot attributes you won't be exporting any data and I should be displaying a view of the lots dataset with each DOB lot flagged?
see meeting notes here and source data spreadsheet here
extract new source data
adding logic for new variables
exporting new source data
GFT dataset details
Category -> Variables -> Source
more GFT dataset details
Category -> Variables
Variable -> Sources
mappings documented (similar to the source data excel file).Potential approach to confirming existing variables
variables.csv
cover all expected GFT variables by comparing it to the source data excel file)Potential approach to adding a new/missing variable (builds/tests fail until the last step)
variables.csv
dcpy/library/templates/
YAML file for the source (if needed)recipe.yml
and_sources.yml
(if needed)variable_id
fornull
andunique
int_buffers__all
test_expected_pilot_projects.csv
by copying the table the csv was compared to