Open sunaynagoel opened 4 years ago
Hello all, I'm also a bit confused on where I'm going wrong. Here is my code:
library(tidyr)
CenDF <- CenDF %>%
mutate(variable=case_when(
variable=="B25077_001" ~ "HouseValue",
variable=="B19013_001" ~ "HHIncome")) %>%
select(-moe) %>%
spread(variable, estimate) #Spread moves rows into columns
And it gives me the following error:
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 1832 rows: * 1, 2 * 3, 4 * 5, 6 * 7, 8 * 9, 10 * 11, 12 * 13, 14 * 15, 16 * 17, 18 * 19, 20 * 21, 22 * 23, 24 * 25, 26 * 27, 28 * 29, 30 * 31, 32 * 33, 34 * 35, 36 * 37, 38 * 39, 40 * 41, 42 * 43, 44 * 45, 46 * 47, 48 * 49, 50 * 51, 52 * 53, 54 * 55, 56 * 57, 58 * 59, 60 * 61, 62 * 63, 64 * 65, 66 * 67, 68 * 69, 70 * 71, 72 * 73, 74 * 75, 76 * 77, 78 * 79, 80 * 81, 82 * 83, 84 * 85, 86 * 87, 88 * 89, 90 * 91, 92 * 93, 94 * 95, 96 * 97, 98 * 99, 100 * 101, 102 * 103, 104 * 105, 106 * 107, 108 * 109, 110 * 111, 112 * 113, 114 * 115, 116 * 117, 118 * 119, 120 * 121, 122 * 123, 124 * 125, 126 * 127, 128 * 129, 130 * 131, 132 * 133, 134 * 135, 136 * 137, 138 * 139, 140 * 141, 142 * 143, 144 * 145, 146 * 147, 148 * 149, 150 * 151, 152 * 153, 154 * 155, 156 * 157, 158 * 159, 160 * 161, 162 * 163, 164 * 165, 166 * 167, 168 * 169, 170 * 171, 172 * 173, 174 * 175, 176 * 177, 178 * 179, 180 * 181, 1
Hello all, I'm also a bit confused on where I'm going wrong. Here is my code:
library(tidyr) CenDF <- CenDF %>% mutate(variable=case_when( variable=="B25077_001" ~ "HouseValue", variable=="B19013_001" ~ "HHIncome")) %>% select(-moe) %>% spread(variable, estimate) #Spread moves rows into columns
And it gives me the following error:
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 1832 rows: * 1, 2 * 3, 4 * 5, 6 * 7, 8 * 9, 10 * 11, 12 * 13, 14 * 15, 16 * 17, 18 * 19, 20 * 21, 22 * 23, 24 * 25, 26 * 27, 28 * 29, 30 * 31, 32 * 33, 34 * 35, 36 * 37, 38 * 39, 40 * 41, 42 * 43, 44 * 45, 46 * 47, 48 * 49, 50 * 51, 52 * 53, 54 * 55, 56 * 57, 58 * 59, 60 * 61, 62 * 63, 64 * 65, 66 * 67, 68 * 69, 70 * 71, 72 * 73, 74 * 75, 76 * 77, 78 * 79, 80 * 81, 82 * 83, 84 * 85, 86 * 87, 88 * 89, 90 * 91, 92 * 93, 94 * 95, 96 * 97, 98 * 99, 100 * 101, 102 * 103, 104 * 105, 106 * 107, 108 * 109, 110 * 111, 112 * 113, 114 * 115, 116 * 117, 118 * 119, 120 * 121, 122 * 123, 124 * 125, 126 * 127, 128 * 129, 130 * 131, 132 * 133, 134 * 135, 136 * 137, 138 * 139, 140 * 141, 142 * 143, 144 * 145, 146 * 147, 148 * 149, 150 * 151, 152 * 153, 154 * 155, 156 * 157, 158 * 159, 160 * 161, 162 * 163, 164 * 165, 166 * 167, 168 * 169, 170 * 171, 172 * 173, 174 * 175, 176 * 177, 178 * 179, 180 * 181, 1
Okay, I figured out the issue.
Prior to this step, I ran the following code:
dat <- c(HouseValue = "B25077_001", HHIncome = "B19013_001")
CenDF <- get_acs(geography="tract", year=2017, survey="acs5",
variables= dat, county = "Maricopa",
state="AZ", geometry=T)
head(CenDF)
This had already labeled my variables as 'HouseValue' and 'HHIncome'. Thus, when I attempted to run:
library(tidyr)
CenDF <- CenDF %>%
mutate(variable=case_when(
variable=="B25077_001" ~ "HouseValue",
variable=="B19013_001" ~ "HHIncome")) %>%
select(-moe) %>%
spread(variable, estimate) #Spread moves rows into columns
I got an error.
Thus, I've changed my code to be as follows:
dat <- c(HouseValue = "B25077_001", HHIncome = "B19013_001")
CenDF <- get_acs(geography="tract", year=2017, survey="acs5",
variables= dat, county = "Maricopa",
state="AZ", geometry=T)
head(CenDF)
Which gives an output of:
Simple feature collection with 6 features and 5 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -112.0654 ymin: 33.46573 xmax: -111.04 ymax: 34.03733
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
GEOID NAME variable estimate moe
1 04013010101 Census Tract 101.01, Maricopa County, Arizona HHIncome 87167 21599
2 04013010101 Census Tract 101.01, Maricopa County, Arizona HouseValue 543900 40233
3 04013010102 Census Tract 101.02, Maricopa County, Arizona HHIncome 115725 23564
4 04013010102 Census Tract 101.02, Maricopa County, Arizona HouseValue 895100 148158
5 04013030401 Census Tract 304.01, Maricopa County, Arizona HHIncome 113889 12613
6 04013030401 Census Tract 304.01, Maricopa County, Arizona HouseValue 844600 80941
geometry
1 MULTIPOLYGON (((-111.7869 3...
2 MULTIPOLYGON (((-111.7869 3...
3 MULTIPOLYGON (((-112.0654 3...
4 MULTIPOLYGON (((-112.0654 3...
5 MULTIPOLYGON (((-111.9648 3...
6 MULTIPOLYGON (((-111.9648 3...
and then the following code:
library(tidyr)
CenDF <- CenDF %>%
select(- moe) %>%
spread(variable, estimate)
head(CenDF)
which gives an output of:
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -112.772 ymin: 33.46573 xmax: -111.04 ymax: 34.03733
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
GEOID NAME HHIncome HouseValue
1 04013010101 Census Tract 101.01, Maricopa County, Arizona 87167 543900
2 04013010102 Census Tract 101.02, Maricopa County, Arizona 115725 895100
3 04013030401 Census Tract 304.01, Maricopa County, Arizona 113889 844600
4 04013030402 Census Tract 304.02, Maricopa County, Arizona 81994 473600
5 04013040502 Census Tract 405.02, Maricopa County, Arizona 40434 205300
6 04013040506 Census Tract 405.06, Maricopa County, Arizona 40978 156600
geometry
1 MULTIPOLYGON (((-111.7869 3...
2 MULTIPOLYGON (((-112.0654 3...
3 MULTIPOLYGON (((-111.9648 3...
4 MULTIPOLYGON (((-111.9958 3...
5 MULTIPOLYGON (((-112.772 33...
6 MULTIPOLYGON (((-112.3586 3...
This omits the 'mutate' function because I already renamed my variables in 'dat'.
Hope this helps!
Okay, now I have one more question for the Household Income to Home Value ratio:
Should it be HHIncome/HomeValue or HomeValue/HHIncome?
The variable label HHInc_HousePrice_Ratio suggests that it should be HHIncome/HomeValue, but the instructions say to divide home value by household income.
Any clarification would be helpful!
It should be House value / Household income to my understanding.
It should be House value / Household income to my understanding.
You're right! I re-read the instructions again and we're replicating our assigned article that does house price to income ratio. The variable title just threw me off a bit. Thanks @sunaynagoel
I have question related to function datatable(). I am able to extract information I needed for the lab 02 but I was wondering if there is an easier way. The part I did not like about this function in this particular case was that the table it generated did not auto fit to the window, it was not letting me scroll horizontally. The table had 7 columns (including row name) and column six (geometry) was taking too much space. I did not need that column at the time, I tried to hide it using different options ( for eg; escape and other formatting options) available for the function, but nothing did the trick. My question is there a better way to
I'd also like to know this!
I used the datatable for my analysis, but I ended up omitting it from my file because I got an error message that it was too large and I needed a server.
-Courtney
On Mon, Oct 28, 2019, 9:35 PM sunaynagoel notifications@github.com wrote:
I have question related to function datatable(). I am able to extract information I needed for the lab 02 but I was wondering if there is an easier way. The part I did not like about this function in this particular case was that the table it generated did not auto fit to the window, it was not letting me scroll horizontally. The table had 7 columns (including row name) and column six (geometry) was taking too much space. I did not need that column at the time, I tried to hide it using different options ( for eg; escape and other formatting options) available for the function, but nothing did the trick. My question is there a better way to
- present table in compressed form.
- How can we pick pick and choose columns we want to display.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/6?email_source=notifications&email_token=AM6K2WSORY7OMGPERT4CSX3QQ64SZA5CNFSM4JF5ZC22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPGQ4A#issuecomment-547252336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM6K2WTNPBXAGLCBKSSKYCTQQ64SZANCNFSM4JF5ZC2Q .
@castower I was able to use it by making Geometry=F in get_acs function. That eliminated the column 6 and I was able to manipulate data faster. Then I turned it back to Geometry=T in get_acs function to generate the map in step 4. Datatable() is a very powerful and useful function would like to learn how to use it. I poured though tutorial on CRAN, GitHub queries related to this and tried different things but could not make it work.
@castower @sunaynagoel without affecting the initial API call for data I used the select function to deselect -geometry, then used a pipe to send through the datatable() function. It ended up keeping the geometry field but made it the last column so despite stretching out right, I can play with the ratio column and others without having to scroll the window.
@castower @sunaynagoel without affecting the initial API call for data I used the select function to deselect -geometry, then used a pipe to send through the datatable() function. It ended up keeping the geometry field but made it the last column so despite stretching out right, I can play with the ratio column and others without having to scroll the window.
Thanks @cjbecerr @sunaynagoel! I tried the pipe option and also could not get the geometry function to completely go away. For some reason, everytime I try to knit the file it keeps telling me that the data set is too large without a server (It will export, but takes a really long time). Thus, I just used the table to search for answers, but then removed it from the file and used the order() functions.
@sunaynagoel you assigned county to two different codes of variables so you need to change one.
@katiegentry07 Thank for catching that. I changed it.
Hi all! Just a basic question here, do we need to have the code for all 3 questions included with the RMD file? I know it says we don't need to include "additional code", but I wasn't sure what this means since otherwise it will only show the code and output from question 3.
@sunaynagoel @castower Glad to see you going for the bonus using data.table. There are couple ways to do it probably, and I will post how I did it in lab solutions, which I will post Thursday. As a hint though, I relied on indexing to create a clean table.
@etbartell In this case I'm not looking for any additional code related to the questions 1-3. You only need to include the substantive answers to the questions.
I am having problems at step 2 My spread function is not doing what it is supposed to do. Also I am unable to deselect "moe" Here is my code and error.
Error : Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 6440 rows: