Open rhirotacouncil opened 1 year ago
I'm going to be pushing small edits to the cr
branch - feel free to look at what I've changed there and take what you like + drop what you don't. I'll also post a lil comment here as I get to each file. I'm having a lot of leaflet/shapefile replication issues that I mostly solve below but are likely just versioning and might break it for you.
vacant_cd
and vacant_cd.shp
because - all I did was move the ungroup() %>%
up before the left join and add as.data.frame()
. Did this because otherwise I got an sf/tbl object which couldn't be plotted. Reads in the vacancy data as ct_vacant_dataset
but then refers to it as leased_not_leased_2022
, so changed it to match the Storefronts_cd_map.Rmd
naming.
Changed the chunk around row 46 to be tidyverse lingo just to help me understand. I still don't quite understand
In the chunk starting row 55 we're dropping the decimal in order to be able to merge with the storefront data - which only provides the round numbers for the locations. It's BANANAS to me that they provide the data that way, by removing the decimals they are deciding to use old census tract designations, but only for tracts where there has been enough growth to merit them being divided. Frankly it makes no sense that they did that, but given that we're trying to make it work, I've at least condensed our spatial file. Previously we were merging each storefront to all tracts that it could match (ie if the storefront is reported to be in CT 2, we would merge it with both CT 2.01 and CT 2.02) now I've combined the shapes so that there will only be one existing shape labelled 2. We could use the lat lon to find the correct 2020 tract but there are some missing obs. The spatial difference is less bad than I had thought it might be - places where you see a red line are where I've combined the two (or more) tracts.
I didn't go through the classification code! but would recommend re-running that part with the group_by(boro_ct201) %>% summarize(geometry = st_union(geometry))
addition.
the nested ifelse
on row 118 is a perfect for a case_when
!
I'll think unsupervised clustering can be really confusing for people who don't have a background in cs/stats - I think understanding what our takeaway from this clustering is will be really important in communicating. It may be useful to create another graphic that shows the differences between these clusters more directly (eg, different vacancy trends by size would be a great scatterplot!)
will pop through the next few files after a break!
data.table
code in dplyr
verbiage to help me understand what it was doing and I got different numbers for glendale + springfield - not sure why. censusapi
is a new library to me, I use tidycensus
but this seems like it has some cool pros to it so good to learn about it!merge(sr, med_inc, by = "geoid")
not sure if it would better to make that explicit in the text somehow or to make the same choice as above of basically merging the tracts on the census side as well. It is going to be dropping things non-randomly - looks like there are a lot more tracts in SI + Queens that will be effected.[e: Excusing awful formatting bc it's just a test. Each bar here represents all tracts within a certain income bracket (ie the average vacancy for all tracts w a median income between 10k-20k for the first bar). This shows a really different trend than I think was implied by the first chart. I haven't put too much thought into it yet - maybe I'm grouping things in a way that isn't reasonable, but shows an almost inverse takeaway than the scatterplot format. This feels consistent with the cd map, where the highest vacancy zones are fairly high income, lower manhattan and nw brooklyn.
]
~/utils/unzip_files.R
[e: I think this was just for unzip_sf
instead of loading councildown!]missing_storefront_2.csv
reference to a relative path that matches other filesfeel free to ping me about anything above - and would love any general thoughts
Thanks, @amd112 for flagging this! I agree with combining the geographies & will send an email to the Open Data folk asking them to fix the census tracts / prevent them from rounding. Combining the shapes is a great idea, I think also worth fixing the original map...
Storefronts_clusterinng_map.Rmd
- Reads in the vacancy data as
ct_vacant_dataset
but then refers to it asleased_not_leased_2022
, so changed it to match theStorefronts_cd_map.Rmd
naming.- Changed the chunk around row 46 to be tidyverse lingo just to help me understand. I still don't quite understand
- In the chunk starting row 55 we're dropping the decimal in order to be able to merge with the storefront data - which only provides the round numbers for the locations. It's BANANAS to me that they provide the data that way, by removing the decimals they are deciding to use old census tract designations, but only for tracts where there has been enough growth to merit them being divided. Frankly it makes no sense that they did that, but given that we're trying to make it work, I've at least condensed our spatial file. Previously we were merging each storefront to all tracts that it could match (ie if the storefront is reported to be in CT 2, we would merge it with both CT 2.01 and CT 2.02) now I've combined the shapes so that there will only be one existing shape labelled 2. We could use the lat lon to find the correct 2020 tract but there are some missing obs. The spatial difference is less bad than I had thought it might be - places where you see a red line are where I've combined the two (or more) tracts.
- I didn't go through the classification code! but would recommend re-running that part with the
group_by(boro_ct201) %>% summarize(geometry = st_union(geometry))
addition.- the nested
ifelse
on row 118 is a perfect for acase_when
!- I'll think unsupervised clustering can be really confusing for people who don't have a background in cs/stats - I think understanding what our takeaway from this clustering is will be really important in communicating. It may be useful to create another graphic that shows the differences between these clusters more directly (eg, different vacancy trends by size would be a great scatterplot!)
will pop through the next few files after a break!
Code review request for
Some things: