DS4PS / cpp-529-master

Course files for CPP 529 Data Analytics Practicum focused on models of neighborhood change.
https://ds4ps.org/cpp-529-master/
2 stars 1 forks source link

Lab 02 #6

Open sunaynagoel opened 4 years ago

sunaynagoel commented 4 years ago

I am having problems at step 2 My spread function is not doing what it is supposed to do. Also I am unable to deselect "moe" Here is my code and error.


medianvalue <- c (MedianHHIncome = "B25099_001",
                  MedianHouseValue = "B25077_001")

county <- get_acs (geography = "county", year = 2017, survey = "acs5", variables = medianvalue, geometry = T )
head(county)
county <- county %>%
  mutate (variable= case_when (
                     variable == "B25099_001" ~ "HHIncome" ,
                      variable == "B25077_001" ~ "HHValue")) %>%
  select (- moe) %>%
  spread (variable, estimate) %>%
  mutate( house_price_to_income = round (MedianHHValue / MedianHHIncome *100,2))
  head(county)

Error : Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 6440 rows:

castower commented 4 years ago

Hello all, I'm also a bit confused on where I'm going wrong. Here is my code:

library(tidyr)
CenDF <- CenDF %>% 
  mutate(variable=case_when( 
    variable=="B25077_001" ~ "HouseValue",
    variable=="B19013_001" ~ "HHIncome")) %>%
  select(-moe) %>%  
  spread(variable, estimate)  #Spread moves rows into columns

And it gives me the following error:

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 1832 rows: * 1, 2 * 3, 4 * 5, 6 * 7, 8 * 9, 10 * 11, 12 * 13, 14 * 15, 16 * 17, 18 * 19, 20 * 21, 22 * 23, 24 * 25, 26 * 27, 28 * 29, 30 * 31, 32 * 33, 34 * 35, 36 * 37, 38 * 39, 40 * 41, 42 * 43, 44 * 45, 46 * 47, 48 * 49, 50 * 51, 52 * 53, 54 * 55, 56 * 57, 58 * 59, 60 * 61, 62 * 63, 64 * 65, 66 * 67, 68 * 69, 70 * 71, 72 * 73, 74 * 75, 76 * 77, 78 * 79, 80 * 81, 82 * 83, 84 * 85, 86 * 87, 88 * 89, 90 * 91, 92 * 93, 94 * 95, 96 * 97, 98 * 99, 100 * 101, 102 * 103, 104 * 105, 106 * 107, 108 * 109, 110 * 111, 112 * 113, 114 * 115, 116 * 117, 118 * 119, 120 * 121, 122 * 123, 124 * 125, 126 * 127, 128 * 129, 130 * 131, 132 * 133, 134 * 135, 136 * 137, 138 * 139, 140 * 141, 142 * 143, 144 * 145, 146 * 147, 148 * 149, 150 * 151, 152 * 153, 154 * 155, 156 * 157, 158 * 159, 160 * 161, 162 * 163, 164 * 165, 166 * 167, 168 * 169, 170 * 171, 172 * 173, 174 * 175, 176 * 177, 178 * 179, 180 * 181, 1
castower commented 4 years ago

Hello all, I'm also a bit confused on where I'm going wrong. Here is my code:

library(tidyr)
CenDF <- CenDF %>% 
  mutate(variable=case_when( 
    variable=="B25077_001" ~ "HouseValue",
    variable=="B19013_001" ~ "HHIncome")) %>%
  select(-moe) %>%  
  spread(variable, estimate)  #Spread moves rows into columns

And it gives me the following error:

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 1832 rows: * 1, 2 * 3, 4 * 5, 6 * 7, 8 * 9, 10 * 11, 12 * 13, 14 * 15, 16 * 17, 18 * 19, 20 * 21, 22 * 23, 24 * 25, 26 * 27, 28 * 29, 30 * 31, 32 * 33, 34 * 35, 36 * 37, 38 * 39, 40 * 41, 42 * 43, 44 * 45, 46 * 47, 48 * 49, 50 * 51, 52 * 53, 54 * 55, 56 * 57, 58 * 59, 60 * 61, 62 * 63, 64 * 65, 66 * 67, 68 * 69, 70 * 71, 72 * 73, 74 * 75, 76 * 77, 78 * 79, 80 * 81, 82 * 83, 84 * 85, 86 * 87, 88 * 89, 90 * 91, 92 * 93, 94 * 95, 96 * 97, 98 * 99, 100 * 101, 102 * 103, 104 * 105, 106 * 107, 108 * 109, 110 * 111, 112 * 113, 114 * 115, 116 * 117, 118 * 119, 120 * 121, 122 * 123, 124 * 125, 126 * 127, 128 * 129, 130 * 131, 132 * 133, 134 * 135, 136 * 137, 138 * 139, 140 * 141, 142 * 143, 144 * 145, 146 * 147, 148 * 149, 150 * 151, 152 * 153, 154 * 155, 156 * 157, 158 * 159, 160 * 161, 162 * 163, 164 * 165, 166 * 167, 168 * 169, 170 * 171, 172 * 173, 174 * 175, 176 * 177, 178 * 179, 180 * 181, 1

Okay, I figured out the issue.

Prior to this step, I ran the following code:

dat <- c(HouseValue = "B25077_001", HHIncome = "B19013_001")
CenDF <- get_acs(geography="tract", year=2017, survey="acs5", 
                  variables= dat, county = "Maricopa", 
                  state="AZ", geometry=T)
head(CenDF)

This had already labeled my variables as 'HouseValue' and 'HHIncome'. Thus, when I attempted to run:

library(tidyr)
CenDF <- CenDF %>% 
  mutate(variable=case_when( 
    variable=="B25077_001" ~ "HouseValue",
    variable=="B19013_001" ~ "HHIncome")) %>%
  select(-moe) %>%  
  spread(variable, estimate)  #Spread moves rows into columns

I got an error.

Thus, I've changed my code to be as follows:

dat <- c(HouseValue = "B25077_001", HHIncome = "B19013_001")
CenDF <- get_acs(geography="tract", year=2017, survey="acs5", 
                  variables= dat, county = "Maricopa", 
                  state="AZ", geometry=T)
head(CenDF)

Which gives an output of:

Simple feature collection with 6 features and 5 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -112.0654 ymin: 33.46573 xmax: -111.04 ymax: 34.03733
epsg (SRID):    4269
proj4string:    +proj=longlat +datum=NAD83 +no_defs
        GEOID                                          NAME   variable estimate    moe
1 04013010101 Census Tract 101.01, Maricopa County, Arizona   HHIncome    87167  21599
2 04013010101 Census Tract 101.01, Maricopa County, Arizona HouseValue   543900  40233
3 04013010102 Census Tract 101.02, Maricopa County, Arizona   HHIncome   115725  23564
4 04013010102 Census Tract 101.02, Maricopa County, Arizona HouseValue   895100 148158
5 04013030401 Census Tract 304.01, Maricopa County, Arizona   HHIncome   113889  12613
6 04013030401 Census Tract 304.01, Maricopa County, Arizona HouseValue   844600  80941
                        geometry
1 MULTIPOLYGON (((-111.7869 3...
2 MULTIPOLYGON (((-111.7869 3...
3 MULTIPOLYGON (((-112.0654 3...
4 MULTIPOLYGON (((-112.0654 3...
5 MULTIPOLYGON (((-111.9648 3...
6 MULTIPOLYGON (((-111.9648 3...

and then the following code:

library(tidyr)
CenDF <- CenDF %>% 
  select(- moe) %>%  
  spread(variable, estimate)

head(CenDF)

which gives an output of:

Simple feature collection with 6 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -112.772 ymin: 33.46573 xmax: -111.04 ymax: 34.03733
epsg (SRID):    4269
proj4string:    +proj=longlat +datum=NAD83 +no_defs
        GEOID                                          NAME HHIncome HouseValue
1 04013010101 Census Tract 101.01, Maricopa County, Arizona    87167     543900
2 04013010102 Census Tract 101.02, Maricopa County, Arizona   115725     895100
3 04013030401 Census Tract 304.01, Maricopa County, Arizona   113889     844600
4 04013030402 Census Tract 304.02, Maricopa County, Arizona    81994     473600
5 04013040502 Census Tract 405.02, Maricopa County, Arizona    40434     205300
6 04013040506 Census Tract 405.06, Maricopa County, Arizona    40978     156600
                        geometry
1 MULTIPOLYGON (((-111.7869 3...
2 MULTIPOLYGON (((-112.0654 3...
3 MULTIPOLYGON (((-111.9648 3...
4 MULTIPOLYGON (((-111.9958 3...
5 MULTIPOLYGON (((-112.772 33...
6 MULTIPOLYGON (((-112.3586 3...

This omits the 'mutate' function because I already renamed my variables in 'dat'.

Hope this helps!

castower commented 4 years ago

Okay, now I have one more question for the Household Income to Home Value ratio:

Should it be HHIncome/HomeValue or HomeValue/HHIncome?

The variable label HHInc_HousePrice_Ratio suggests that it should be HHIncome/HomeValue, but the instructions say to divide home value by household income.

Any clarification would be helpful!

sunaynagoel commented 4 years ago

It should be House value / Household income to my understanding.

castower commented 4 years ago

It should be House value / Household income to my understanding.

You're right! I re-read the instructions again and we're replicating our assigned article that does house price to income ratio. The variable title just threw me off a bit. Thanks @sunaynagoel

sunaynagoel commented 4 years ago

I have question related to function datatable(). I am able to extract information I needed for the lab 02 but I was wondering if there is an easier way. The part I did not like about this function in this particular case was that the table it generated did not auto fit to the window, it was not letting me scroll horizontally. The table had 7 columns (including row name) and column six (geometry) was taking too much space. I did not need that column at the time, I tried to hide it using different options ( for eg; escape and other formatting options) available for the function, but nothing did the trick. My question is there a better way to

  1. present table in compressed form.
  2. How can we pick pick and choose columns we want to display.
castower commented 4 years ago

I'd also like to know this!

I used the datatable for my analysis, but I ended up omitting it from my file because I got an error message that it was too large and I needed a server.

-Courtney

On Mon, Oct 28, 2019, 9:35 PM sunaynagoel notifications@github.com wrote:

I have question related to function datatable(). I am able to extract information I needed for the lab 02 but I was wondering if there is an easier way. The part I did not like about this function in this particular case was that the table it generated did not auto fit to the window, it was not letting me scroll horizontally. The table had 7 columns (including row name) and column six (geometry) was taking too much space. I did not need that column at the time, I tried to hide it using different options ( for eg; escape and other formatting options) available for the function, but nothing did the trick. My question is there a better way to

  1. present table in compressed form.
  2. How can we pick pick and choose columns we want to display.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/6?email_source=notifications&email_token=AM6K2WSORY7OMGPERT4CSX3QQ64SZA5CNFSM4JF5ZC22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPGQ4A#issuecomment-547252336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM6K2WTNPBXAGLCBKSSKYCTQQ64SZANCNFSM4JF5ZC2Q .

sunaynagoel commented 4 years ago

@castower I was able to use it by making Geometry=F in get_acs function. That eliminated the column 6 and I was able to manipulate data faster. Then I turned it back to Geometry=T in get_acs function to generate the map in step 4. Datatable() is a very powerful and useful function would like to learn how to use it. I poured though tutorial on CRAN, GitHub queries related to this and tried different things but could not make it work.

cjbecerr commented 4 years ago

@castower @sunaynagoel without affecting the initial API call for data I used the select function to deselect -geometry, then used a pipe to send through the datatable() function. It ended up keeping the geometry field but made it the last column so despite stretching out right, I can play with the ratio column and others without having to scroll the window.

castower commented 4 years ago

@castower @sunaynagoel without affecting the initial API call for data I used the select function to deselect -geometry, then used a pipe to send through the datatable() function. It ended up keeping the geometry field but made it the last column so despite stretching out right, I can play with the ratio column and others without having to scroll the window.

Thanks @cjbecerr @sunaynagoel! I tried the pipe option and also could not get the geometry function to completely go away. For some reason, everytime I try to knit the file it keeps telling me that the data set is too large without a server (It will export, but takes a really long time). Thus, I just used the table to search for answers, but then removed it from the file and used the order() functions.

katiegentry07 commented 4 years ago

@sunaynagoel you assigned county to two different codes of variables so you need to change one.

sunaynagoel commented 4 years ago

@katiegentry07 Thank for catching that. I changed it.

etbartell commented 4 years ago

Hi all! Just a basic question here, do we need to have the code for all 3 questions included with the RMD file? I know it says we don't need to include "additional code", but I wasn't sure what this means since otherwise it will only show the code and output from question 3.

AntJam-Howell commented 4 years ago

@sunaynagoel @castower Glad to see you going for the bonus using data.table. There are couple ways to do it probably, and I will post how I did it in lab solutions, which I will post Thursday. As a hint though, I relied on indexing to create a clean table.

AntJam-Howell commented 4 years ago

@etbartell In this case I'm not looking for any additional code related to the questions 1-3. You only need to include the substantive answers to the questions.