Open etbartell opened 4 years ago
I'm mentioning @Anthony-Howell-PhD to make sure he is getting the notifications.
That code looks fine, it could be something you are doing earlier. Can you include a reproducible example?
Here's everything I'm running up to that point: Step 1: Load packages
library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(ggrepel)
library(dplyr)
options(tigris_use_cache=TRUE)
options(tigris_class="sf")
Step 2: Input Census API Key
census_api_key("8eab9b16f44cb26460ecbde164482194b7052772")
Step 3: Get education data and create a data frame (see screenshot below for acs at this point)
acs <- get_acs("tract", table = "B15003", cache_table = TRUE,
geometry = TRUE, state = "AZ", county = "Maricopa County",
year = 2017, output = "tidy")
Step 4: Transform data to grouped factor levels (see screenshot for transformed acs after this step)
acs <-
acs %>%
mutate( id = str_extract(variable, "[0-9]{3}$") %>% as.integer ) %>%
# variable 1 is the "total", which is just the sum of the others
filter(id > 1) %>%
mutate(education = case_when(
id %>% between(2, 16) ~ "No HS diploma",
id %>% between(17, 21) ~ "HS, no Bachelors",
id > 21 ~ "At Least a Bachelors"
)) %>%
group_by(GEOID, education) %>%
summarise(estimate = sum(estimate))
This code works for me.
# load libraries
library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(ggrepel)
options(tigris_use_cache=TRUE)
options(tigris_class="sf")
census_api_key("8eab9b16f44cb26460ecbde164482194b7052772")
#Bring in 2017 variable related to educational attainment
acs <- get_acs("tract", table = "B15003", cache_table = TRUE,
geometry = TRUE, state = "AZ", county = "Maricopa County",
year = 2017, output = "tidy")
acs
#The educational attainment splits things out to quite a few levels (with one for “finished 4th grade” and another for “finished 5th grade” and so on), so I’ll collapse them down to a handful of categories.
acs <- acs %>%
mutate(
id = str_extract(variable, "[0-9]{3}$") %>% as.integer
) %>%
# variable 1 is the "total", which is just the sum of the others
filter(id > 1) %>%
mutate(education =case_when(
id %>% between(2, 16) ~ "No HS diploma",
id %>% between(17, 21) ~ "HS, no Bachelors",
id > 21 ~ "At Least a Bachelors"
)) %>%
group_by(GEOID, education) %>%
summarise(estimate = sum(estimate))
acs
Yeah I copy-pasted your code just to make sure I didn't miss something and it's still giving me the same result. I know conceptually what each step is supposed to do to the data but for some reason it's ignoring everything except the "summarise" step.
That is strange. Can see if anyone else an issue. In the meantime, here’s some options: 1) check make sure all the package libraries loaded are Updated. 2) run the code in r instead of remarried and 3) break down the code into its individual parts without the piping. You can trace what is going on that way.
I am having trouble running Generating Dot chunk of PART III .rmd file. I have not changed anything was just trying to run the sample file.
acs_split <- acs %>%
filter(estimate > 50) %>%
split(.$education)
generate_samples <- function(data)
suppressMessages(st_sample(data, size = round(data$estimate / 100)))
points <- map(acs_split, generate_samples)
points <- imap(points,
~st_sf(data_frame(education = rep(.y, length(.x))),
geometry = .x))
points <- do.call(rbind, points)
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : group length is 0 but data length > 0
@Anthony-Howell-PhD @sunaynagoel I am having the same issue
If you try to run code from part 3 with code from parts 1 and 2 in the same .rmd file, it will not compile correctly. To avoid this error, make sure that the code from part 3 RMD file is placed into a new, separate RMD file from Parts 1 and 2. Let me know if that helps.
Yeah I copy-pasted your code just to make sure I didn't miss something and it's still giving me the same result. I know conceptually what each step is supposed to do to the data but for some reason it's ignoring everything except the "summarise
That is strange. Can see if anyone else an issue. In the meantime, here’s some options: 1) check make sure all the package libraries loaded are Updated. 2) run the code in r instead of remarried and 3) break down the code into its individual parts without the piping. You can trace what is going on that way.
Having the same issue. Has this been solved yet?
@cjbecerr have you tried any of the following?:
1) check make sure all the package libraries loaded are Updated.
2) run the code in r instead of Rmarkdown.
3) break down the code into its individual parts without the piping.
If after doing (1)-(2), the issue is still not resolved, doing step (3) above will help you trace what is going on and why the data is collapsing to only one cell.
@Anthony-Howell-PhD After trouble shooting, I'm seeing everything works fine in both R and Rmarkdown until it reaches the summarise(). It's like it ignores the group_by() and takes the sum with no groupings.
Ok, sorry to hear that. Can you attach your .R file (not .rmd file) and I will check it out.
I am having trouble running Generating Dot chunk of PART III .rmd file. I have not changed anything was just trying to run the sample file.
acs_split <- acs %>% filter(estimate > 50) %>% split(.$education) generate_samples <- function(data) suppressMessages(st_sample(data, size = round(data$estimate / 100))) points <- map(acs_split, generate_samples) points <- imap(points, ~st_sf(data_frame(education = rep(.y, length(.x))), geometry = .x)) points <- do.call(rbind, points)
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : group length is 0 but data length > 0
I am also having this issue even though my code from parts 1 & 2 are in a separate rmd file than part 3
Ok, sorry to hear that. Can you attach your .R file (not .rmd file) and I will check it out.
Ok, sorry to hear that. Can you attach your .R file (not .rmd file) and I will check it out.
Figured it out. Actually turned out to be my packages, needed to restart R. Thank you for the help!
@sunaynagoel @JaesaR. Thanks Jaesa for sending you .r file to me. I've run through each line of code and I can not reproduce the error you both are getting. Everything works fine on my end. Please do the following: (1) update all of the library packages in rstudio and re-run code; (2) if the problem persists, replace ~st_sf(data_frame(education = rep(.y, length(.x))), with ~st_sf(tibble(education = rep(.y, length(.x))),
On my end, I get a warning that date_frame is deprecated, and to use tibble instead. That may be the problem depending on what version of packages you are running.
Thank you for your help. I was able to make mine work.
@Anthony-Howell-PhD. Is it ok just submit .rmd file. Everything works when I run them separately in R console but when I try to produce a html file, it gives errors. Thanks
In this case you can upload the rmd and .r files. I can check the .r file easier to make sure everything is working correctly.
Thank you for your help. I was able to make mine work.
Hello @sunaynagoel, How were you able to make it work? I have run into the same issue.
@Jigarci3 changing ~st_sf(data_frame(education = rep(.y, length(.x))), with ~st_sf(tibble(education = rep(.y, length(.x))), as suggested by @Anthony-Howell-PhD did the trick. for part 3. for some reason my package dplyr was masking gridextra for part 1 and 2. I also had to restart R. Let me know if the helps.
@Anthony-Howell-PhD I'm a bit confused how to find the .r files for submission. Is there a specific place I should look in RStudio? Thanks!
If you are not familiar with .R scripts, then send .rmd file only is fine. For your reference though, R script (.r) can be opened as shown in the screenshot.
If you are not familiar with .R scripts, then send .rmd file only is fine. For your reference though, R script (.r) can be opened as shown in the screenshot.
Thank you!
I was just wondering, are we supposed to submit both rmd files? I know it says just to submit the files for part 3 but since it wouldn't run with the code from part 2, the plots for question 3 wouldn't be included in the submission.
Whichever is easiest for you is ok by me.
Hello all, I'm curious if I'm doing something wrong. In part 3, I'm trying to create my plots with the following code:
census_api_key("my_census_key")
Var<-c("B19013_001", "B25077_001")
## c(Median household Income, Median Housing Value)
CenDF <- get_acs(geography = "county",
variables = Var,
year = 2017,
survey = "acs5",
geometry = TRUE,
shift_geo = TRUE)
CenDF<-CenDF %>%
mutate(variable=case_when(
variable=="B19013_001" ~ "HHIncome",
variable=="B25077_001" ~ "HouseValue")) %>%
select(-moe) %>%
spread(variable, estimate) %>% #Spread moves rows into columns
mutate(HHInc_HousePrice_Ratio=round(HouseValue/HHIncome,2))
Var<-c("B19013_001","B25077_001")
# Download 2008-2012 df
CenDF2012 <- get_acs(geography = "county",
variables = Var,
year = 2012,
survey = "acs5",
geometry = FALSE)
#Create new variable for the housing price to income ratio.
CenDF2012<-CenDF2012 %>%
mutate(variable=case_when(
variable=="B19013_001" ~ "HHIncome2012",
variable=="B25077_001" ~ "HouseValue2012")) %>%
select(-moe,-NAME) %>%
spread(variable, estimate) %>% #Spread moves rows into columns
mutate(HHInc_HousePrice_Ratio2012=round(HouseValue2012/HHIncome2012,2))
CenDF<-merge(CenDF,CenDF2012,by.all="GEOID", all.x=TRUE)
CenDF<-CenDF %>%
mutate(pct_change = 100 * (`HHInc_HousePrice_Ratio` - `HHInc_HousePrice_Ratio2012`) / `HHInc_HousePrice_Ratio2012`)
```{r}
library(viridis)
library(gtools)
upper_limit <- round(max(CenDF$pct_change,na.rm=TRUE) + 10, -1)
lower_limit <- round(min(CenDF$pct_change,na.rm=TRUE) - 10, -1)
CenDF$fill_factor <- quantcut(CenDF$HHInc_HousePrice_Ratio, q = c(0,.1,.25,.5,.75,.9,1))
col.ramp <- viridis(n = 6)
Plot11<- ggplot(CenDF,aes(fill = pct_change)) +
geom_sf(size = 0) +
#geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) +
#geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) +
scale_fill_manual("Price-Income Ratio",values = col.ramp)+
labs(title="Changes in House Price to Income Ratio",
subtitle = "2017 5-Year Estimates vs. 2012 5-Year Estimates for Census Tracts",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2012 and 2017 American Community Survey 5-Year Estimates"
)
) +
theme(plot.caption = element_text(hjust = 0, margin = margin(t = 15))) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
Plot11
My plot keeps giving me the following error:
Error: Continuous value supplied to discrete scale
It was my understanding that this line of code made the scale discrete:
CenDF$fill_factor <- quantcut(CenDF$HHInc_HousePrice_Ratio, q = c(0,.1,.25,.5,.75,.9,1))
Is this incorrect?
Hello all, I'm curious if I'm doing something wrong. In part 3, I'm trying to create my plots with the following code:
census_api_key("my_census_key") Var<-c("B19013_001", "B25077_001") ## c(Median household Income, Median Housing Value) CenDF <- get_acs(geography = "county", variables = Var, year = 2017, survey = "acs5", geometry = TRUE, shift_geo = TRUE) CenDF<-CenDF %>% mutate(variable=case_when( variable=="B19013_001" ~ "HHIncome", variable=="B25077_001" ~ "HouseValue")) %>% select(-moe) %>% spread(variable, estimate) %>% #Spread moves rows into columns mutate(HHInc_HousePrice_Ratio=round(HouseValue/HHIncome,2)) Var<-c("B19013_001","B25077_001") # Download 2008-2012 df CenDF2012 <- get_acs(geography = "county", variables = Var, year = 2012, survey = "acs5", geometry = FALSE) #Create new variable for the housing price to income ratio. CenDF2012<-CenDF2012 %>% mutate(variable=case_when( variable=="B19013_001" ~ "HHIncome2012", variable=="B25077_001" ~ "HouseValue2012")) %>% select(-moe,-NAME) %>% spread(variable, estimate) %>% #Spread moves rows into columns mutate(HHInc_HousePrice_Ratio2012=round(HouseValue2012/HHIncome2012,2)) CenDF<-merge(CenDF,CenDF2012,by.all="GEOID", all.x=TRUE) CenDF<-CenDF %>% mutate(pct_change = 100 * (`HHInc_HousePrice_Ratio` - `HHInc_HousePrice_Ratio2012`) / `HHInc_HousePrice_Ratio2012`)
```{r} library(viridis) library(gtools) upper_limit <- round(max(CenDF$pct_change,na.rm=TRUE) + 10, -1) lower_limit <- round(min(CenDF$pct_change,na.rm=TRUE) - 10, -1) CenDF$fill_factor <- quantcut(CenDF$HHInc_HousePrice_Ratio, q = c(0,.1,.25,.5,.75,.9,1)) col.ramp <- viridis(n = 6) Plot11<- ggplot(CenDF,aes(fill = pct_change)) + geom_sf(size = 0) + #geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) + #geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) + scale_fill_manual("Price-Income Ratio",values = col.ramp)+ labs(title="Changes in House Price to Income Ratio", subtitle = "2017 5-Year Estimates vs. 2012 5-Year Estimates for Census Tracts", caption = paste0( "Data sources:", "\n U.S. Census Bureau, 2012 and 2017 American Community Survey 5-Year Estimates" ) ) + theme(plot.caption = element_text(hjust = 0, margin = margin(t = 15))) + theme(axis.ticks = element_blank(), axis.text = element_blank()) + theme(panel.background = element_blank()) Plot11
My plot keeps giving me the following error:
Error: Continuous value supplied to discrete scale
It was my understanding that this line of code made the scale discrete:
CenDF$fill_factor <- quantcut(CenDF$HHInc_HousePrice_Ratio, q = c(0,.1,.25,.5,.75,.9,1))
Is this incorrect?
I figured it out after staring at my code for a while...I forgot to change the variable I was plotting.
Hi, I'm getting an error when I run this code.
acs_split <- acs %>%
filter(estimate > 50) %>%
split(.$education)
This is from Lab 5 Part 2 instructions for CPP 529. It can be found here : https://ds4ps.org/cpp-529-spr-2020/LABS/Lab5b-MapVis2.html under 'Generating Dots' heading.
Error in split.default(x=seq_len(nrow(x)), f=f, drop=drop, ...) : group length is 0 but data length > 0
But oddly enough, when I knit the entire document, the error doesn't appear and the document gets knitted fully. So, I don't know what is going on
Is anyone else having an issue in Part 3 where the acs and acs12 dataframes get collapsed to a single number when you use the summarise function? It seems to have just disregarded every command above the last one. Here's the code I used:
Then here's what it did to the dataframe: