DS4PS / cpp-529-spr-2020

Course shell for CPP 529 Data Practicum on Community Analytics for Spring 2020.
http://ds4ps.org/cpp-529-spr-2020
1 stars 1 forks source link

Lab 02 - Error Output must be unique #4

Open ecking opened 4 years ago

ecking commented 4 years ago

I've been starring at this awhile and not sure why i'm getting the error: "Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 6440 rows: 1, 2 3, 4 5, 6 7, 8 9, 10 11, 12 13, 14 ..."

When I run the first chunk here, I notice that the data under variable is the actual label HHincome and HHvalue instead of the value B19013... does that have something to do with why I'm getting this message?

CenDF <- c(HHincome = "B19013_001",
              HHvalue = "B25077_001")

county_HH <- get_acs(geography = "county",
                      year = 2017,
                      survey = "acs5",
                     variables= CenDF, 
                     geometry=T)
head(county_HH)
county_HH<-county_HH %>%
  mutate(variable=case_when( 
    variable=="B19013_001" ~ "HHincome",
    variable=="B25077_001" ~ "HHvalue")) %>%
              select(-moe) %>%
              spread(variable, estimate)%>%
              mutate(HHInc_HousePrice_Ratio=round(HHincome/HHvalue*100,2)) 

head(county_HH)
ecking commented 4 years ago

HI guys, I solved the issue and I think it's important to note this as I don't recall it being in the lecture (i could be wrong) But in the first chunk of this code, I had B19013_001.... I added an E onto it and that code chunk worked. In the lower code chunk for the variable you do NOT have an E.

lecy commented 4 years ago

@Anthony-Howell-PhD just making sure you are receiving these notification?

AntJam-Howell commented 4 years ago

@ecking Thanks for your post and glad you attempted to resolve the problem. Its important to note though that when you changed the variable name from B19013_001 to B19013_001E, you did not solve the problem, you actually called upon a new variable.

Let me explain the actual problem and the proper way to troubleshoot. The problem is that when you create CenDF you convert the variable names from original format (i.e. B19013_001) to easier to read format (e.g. HHincome). You can confirm this by head(county_HH) and look at Variable column.

In the next step, you include the following code, which is not needed and should be removed: mutate(variable=case_when( variable=="B19013_001" ~ "HHincome", variable=="B25077_001" ~ "HHvalue")) %>%

What mutate is doing here is taking the value of the original variable (e.g. B19013_001) and converting it to an easier to read format (e.g. HHincome). However, you did this already when you created CenDF, so when you try to mutate again there is no B19013_001 name in the data that is why you get the error.

Please remove the mutate part of the code, and re-run the analysis that you posted.

AntJam-Howell commented 4 years ago

@all @ecking. Apologies for the delay in responding. Please remember to include @Anthony-Howell-PhD so that I receive the message more quickly.

ecking commented 4 years ago

So I re-ran the code without that mutate section and I hit an error saying error: object HHvalue not found. I'll have to try again after work today. But if you have any idea on what that error message means, it'd be great to know!

Initially I thought the mutate section was not needed either, but I copied and pasted the two parts from the ppt so figured it must be correct if written that way. Guess I was wrong! ha.

Thanks!

AntJam-Howell commented 4 years ago

Some of the code you will be able to use exactly by copying and pasting from lecture. Other code though will require you to closely look at the code and make some minor manipulations to make sure you are learning what the code is actually doing.

A hint, after you run the following code,

     CenDF <- c(HHincome = "B19013_001",
              HHvalue = "B25077_001")

    county_HH <- get_acs(geography = "county",
                      year = 2017,
                      survey = "acs5",
                     variables= CenDF, 
                     geometry=T)
    head(county_HH

you need to make sure that the values for Variable are named correctly as HHincome and HHvalue. Then when you use the spread(variable, estimate) command, you are transforming the data from long format to wide format, which then allows you to create a new variable HHInc_HousePrice_Ratio using mutate.