NewYorkCityCouncil / vacant_storefronts

Analyzing vacant storefronts dataset
0 stars 0 forks source link

Code Request 2 #2

Open rhirotacouncil opened 1 year ago

rhirotacouncil commented 1 year ago

Code review request for

Some things:

amd112 commented 1 year ago

I'm going to be pushing small edits to the cr branch - feel free to look at what I've changed there and take what you like + drop what you don't. I'll also post a lil comment here as I get to each file. I'm having a lot of leaflet/shapefile replication issues that I mostly solve below but are likely just versioning and might break it for you.

Storefronts_cd_map.Rmd

Storefronts_clusterinng_map.Rmd

will pop through the next few files after a break!

amd112 commented 1 year ago

Storefronts_nta_map_21_22.Rmd

median_income_vacancy_21_22.R

Screenshot 2023-08-16 at 3 37 10 PM Screenshot 2023-08-16 at 3 43 02 PM

[e: Excusing awful formatting bc it's just a test. Each bar here represents all tracts within a certain income bracket (ie the average vacancy for all tracts w a median income between 10k-20k for the first bar). This shows a really different trend than I think was implied by the first chart. I haven't put too much thought into it yet - maybe I'm grouping things in a way that isn't reasonable, but shows an almost inverse takeaway than the scatterplot format. This feels consistent with the cd map, where the highest vacancy zones are fairly high income, lower manhattan and nw brooklyn.

image

]

amd112 commented 1 year ago

sf_rent_explore_21_22.R

feel free to ping me about anything above - and would love any general thoughts

romartinez-nycc commented 1 year ago

Thanks, @amd112 for flagging this! I agree with combining the geographies & will send an email to the Open Data folk asking them to fix the census tracts / prevent them from rounding. Combining the shapes is a great idea, I think also worth fixing the original map...

Storefronts_clusterinng_map.Rmd

  • Reads in the vacancy data as ct_vacant_dataset but then refers to it as leased_not_leased_2022, so changed it to match the Storefronts_cd_map.Rmd naming.
  • Changed the chunk around row 46 to be tidyverse lingo just to help me understand. I still don't quite understand
  • In the chunk starting row 55 we're dropping the decimal in order to be able to merge with the storefront data - which only provides the round numbers for the locations. It's BANANAS to me that they provide the data that way, by removing the decimals they are deciding to use old census tract designations, but only for tracts where there has been enough growth to merit them being divided. Frankly it makes no sense that they did that, but given that we're trying to make it work, I've at least condensed our spatial file. Previously we were merging each storefront to all tracts that it could match (ie if the storefront is reported to be in CT 2, we would merge it with both CT 2.01 and CT 2.02) now I've combined the shapes so that there will only be one existing shape labelled 2. We could use the lat lon to find the correct 2020 tract but there are some missing obs. The spatial difference is less bad than I had thought it might be - places where you see a red line are where I've combined the two (or more) tracts.
image
  • I didn't go through the classification code! but would recommend re-running that part with the group_by(boro_ct201) %>% summarize(geometry = st_union(geometry)) addition.
  • the nested ifelse on row 118 is a perfect for a case_when!
  • I'll think unsupervised clustering can be really confusing for people who don't have a background in cs/stats - I think understanding what our takeaway from this clustering is will be really important in communicating. It may be useful to create another graphic that shows the differences between these clusters more directly (eg, different vacancy trends by size would be a great scatterplot!)

will pop through the next few files after a break!