DS4PS / cpp-529-master

Course files for CPP 529 Data Analytics Practicum focused on models of neighborhood change.
https://ds4ps.org/cpp-529-master/
2 stars 1 forks source link

Final Project #20

Open sunaynagoel opened 4 years ago

sunaynagoel commented 4 years ago

@Anthony-Howell-PhD I am running into following error while knitting the .rmd document.

Quitting from lines 56-78 (Final_Project_Outline_Storyboard-Goel.Rmd) Error in loadNamespace(name) : there is no package called 'lorem' Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart Execution halted

When I tried to install the package "lorem", following error was produced.

 Package LibPath Version Priority Depends Imports LinkingTo Suggests Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum
 NeedsCompilation Built

Anyone else running into this issue?

sunaynagoel commented 4 years ago

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries). library (lorem)

Jigarci3 commented 4 years ago

@Anthony-Howell-PhD I might be completely off on this but I am trying to subset census.dats for my MSA. Here is my code

grep("^SEA", census.dats$msaname, value = TRUE)
these.sea <- census.dats$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- census.dats$fipscounty[ these.sea ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

sea.pop1 <-
  get_acs( geography = "tract", variables = "Median.HH.Value00", "Foreign.Born00", "Recent.Immigrant00", "Poor.English00", "Veteran00", "Poverty00", "Poverty.Black00", " Poverty.White00", "Poverty.Hispanic00", "Pop.Black00", "Pop.Hispanic00", "Pop.Unemp00", "Pop.Manufact00", "Pop.SelfEmp00", "Pop.Prof00", "Female.LaborForce00",
         state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>% 
         select( "TRTID10", estimate ) %>%
         rename( POP=estimate )

sea.pop2 <-
get_acs( geography = "tract", variables = "Median.HH.Value10", "Foreign.Born10", "Recent.Immigrant10", "Poor.English10", "Veteran10", "Poverty10", "Poverty.Black10", " Poverty.White10", "Poverty.Hispanic10", "Pop.Black10", "Pop.Hispanic10", "Pop.Unemp10", "Pop.Manufact10", "Pop.SelfEmp10", "Pop.Prof10", "Female.LaborForce10",
         state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>% 
         select( "TRTID10", estimate ) %>%
         rename( POP=estimate )

sea.pop <- rbind(sea.pop1, sea.pop2)

I am getting the following error: "Error in if (shift_geo) { : argument is not interpretable as logical"

I can't figure out how to correct this error or if I am on the right track with my attempt to only include Seattle data.

AntJam-Howell commented 4 years ago

@Jigarci3 You do not have to use the get_acs function to download data for the final project. The code chunk (below) gives you the 2000 and 2010 census variables. You have the census.dats dataframe that includes the tract ('TRTID10'), state ('state') and county ('county') information already. You need to subset the census.dats to include only the Seattle counties of your interest.

sunaynagoel commented 4 years ago

@Anthony-Howell-PhD. The main (top horizontal) navigation bar is hiding the titles and descriptions of the widgets below it. Is there anyway to customize it? I tried different things but could not achieve desired results. Thanks I am attaching a screen shot. Screen Shot 2019-12-02 at 11 48 51 AM

lecy commented 4 years ago

You can create a custom Cascading Style Sheet (CSS) to moderate this behavior (you have not learned this yet), but the easiest solution is to simplify the menu bar.

Shorten the project title ("Community Analytics Practicum Extravaganza" is tongue-in-cheek, you can change it), and consider grouping some items (can you combine clustering, neighborhoods, and neighborhood change? ).

sunaynagoel commented 4 years ago

@lecy Thank you. Shortening the menu bar helped.

sunaynagoel commented 4 years ago

I was wondering if limiting the decimals places in the table displayed using datatable() to 4 or 5? Will it affect the predictions?

AntJam-Howell commented 4 years ago

It is common to round to 2 or 3 decimal places, which should not have any noticeable effect on model outcomes or predictions.

On Mon, Dec 2, 2019 at 8:07 PM sunaynagoel notifications@github.com wrote:

I was wondering if limiting the decimals places in the table displayed using datatable() to 4 or 5? Will it affect the predictions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y72R52JTS6ILNPQ7XBLQWXEQDA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFX5RTI#issuecomment-560978125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y77WAR5KDINASARLB6DQWXEQDANCNFSM4JTOIFQQ .

-- Anthony Howell Asst. Prof. in Public Policy School of Public Affairs Arizona State University Faculty Profile https://isearch.asu.edu/profile/3501621 (CV https://www.dropbox.com/s/b1pxccpwxm6fats/Howell.CV.pdf?dl=0)

etbartell commented 4 years ago

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries). library (lorem)

@Anthony-Howell-PhD I'm having this same issue with knitting the original rmd but it was not solved with the code provided above. When I try it with this code:

knitr::opts_chunk$set(  message=F, warning=F, echo=F )

install.packages("devtools")
devtools::install_github("gadenbuie/lorem")

#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( lorem )

I get the following error message:

image

When I try to simply use install.package( "lorem" ), it tells me that "package ‘lorem’ is not available (for R version 3.6.1)". Do I need to download a different version of R? I thought we were all using the same version.

AntJam-Howell commented 4 years ago

@etbartell If you cannot download and load the lorem package, the easiest thing to do is go through the .rmd file and remove the lorem call feature. To do this, paste into your search box of the .rmd file to find all instances of the following code: r lorem::ipsum(paragraphs = 1)

You can then delete this code chunk one by one or all at once. Just remember everytime you see that code, it represents a place for to provide your own answer. You can still return to these places to provide your answer by searching for the <!--- symbol that denotes the instructions.

etbartell commented 4 years ago

@etbartell If you cannot download and load the lorem package, the easiest thing to do is go through the .rmd file and remove the lorem call feature. To do this, paste into your search box of the .rmd file to find all instances of the following code: r lorem::ipsum(paragraphs = 1)

You can then delete this code chunk one by one or all at once. Just remember everytime you see that code, it represents a place for to provide your own answer. You can still return to these places to provide your answer by searching for the <!--- symbol that denotes the instructions.

That worked, thanks!

meliapetersen commented 4 years ago

I'm having a weird issue with my code from lab 4 (it didn't happen when I turned in the lab, but it's happening now).

I'm getting the error that I am not using an argument:

Error in rename(., POP = estimate) : unused argument (POP = estimate)

When running this code:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
  select( GEOID, estimate ) %>%
  rename( POP = estimate )

URL <- "https://github.com/DS4PS/cpp-529-master/raw/master/data/ltdb_std_2010_sample.rds"
census.dat <- readRDS(gzcon(url( URL )))

# merge shapefile data with census data in new dataframe
seattle <- merge( seattle.pop, census.dat, by.x="GEOID", by.y="tractid" )
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )

For the empirical framework portion of the dashboard.

Am I on the right track for this portion? I am also unclear on that as well. This was just the code I had from lab 4.

AntJam-Howell commented 4 years ago

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.

Screenshot 2019-12-03 15 01 58

Countyfipconcordance.pdf

meliapetersen commented 4 years ago

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.

Screenshot 2019-12-03 15 01 58

Countyfipconcordance.pdf

I see where I'm going wrong, thank you!

castower commented 4 years ago

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries). library (lorem)

@Anthony-Howell-PhD I'm having this same issue with knitting the original rmd but it was not solved with the code provided above. When I try it with this code:

knitr::opts_chunk$set(  message=F, warning=F, echo=F )

install.packages("devtools")
devtools::install_github("gadenbuie/lorem")

#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( lorem )

I get the following error message:

image

When I try to simply use install.package( "lorem" ), it tells me that "package ‘lorem’ is not available (for R version 3.6.1)". Do I need to download a different version of R? I thought we were all using the same version.

@etbartell I ran into the same problem and found that entering the following code fixed it:

devtools::install_github("gadenbuie/lorem")

I read here for additional info: https://github.com/gadenbuie/lorem

Edit: oops, just realized this is the exact same code as above, I somehow overlooked that!

lepp12 commented 4 years ago

@Anthony-Howell-PhD

I'm running into a similar issue as other on the section requiring code from Lab 4. However, I'm not getting a descriptive error. When I run the following code:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.san <- crosswalk$msaname == "SAN DIEGO, CA"
these.fips <- crosswalk$fipscounty[ these.san ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

san.pop <-
  get_acs( geography = "tract", variables = "B01003_001", state = "06", county = county.fips[state.fips=="06"], geometry = TRUE ) %>%
  select( GEOID, estimate ) %>%

I only get "Error: "

AntJam-Howell commented 4 years ago

You do not need to download data using get_acs. You already have the data you need with census.dats. You only need to subset the census.dats date to your chosen MSA (which is typically a few different counties). Please see the response to Melia above (pasted below) and let me know if that helps.

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.

Screenshot 2019-12-03 15 01 58

Countyfipconcordance.pdf

AntJam-Howell commented 4 years ago

@lepp12 please see above reply.

castower commented 4 years ago

@lepp12, if you don't want to have to Google the names, they are in the crosswalk dataset. Therefore, I just altered my data frame from the crosswalk to be as follows:

name.fips <- crosswalk$countyname[these.YOURCITY]
data.frame( state=state.fips, county=county.fips, FIPS=these.fips, name=name.fips)

This then gave me the names of each county.

AntJam-Howell commented 4 years ago

Nice find @castower

meliapetersen commented 4 years ago

I'm still having trouble understanding what I'm supposed to do with the names of the counties and pulling them from census.dats . I have identified the fip names, but is there a specific place I can refer to for an explanation of the code to pull just the select info for the rest of the dashboard? It feels like such a simple answer but I cannot seem to make sense of it. Thank you!

castower commented 4 years ago

I'm still having trouble understanding what I'm supposed to do with the names of the counties and pulling them from census.dats . I have identified the fip names, but is there a specific place I can refer to for an explanation of the code to pull just the select info for the rest of the dashboard? It feels like such a simple answer but I cannot seem to make sense of it. Thank you!

@meliapetersen I used the filter function to just select the needed counties

sunaynagoel commented 4 years ago

I am a little lost at reading transition matrix. Here is a screen shot of my transition matrix.

Screen Shot 2019-12-03 at 6 53 04 PM

AntJam-Howell commented 4 years ago

Example: looking at the last row, 80.6 percent of counties classified as cluster 4 in 2000 was also clustered as cluster 4 in 2010. 12.9 percent moved into cluster 3, 6.4 percent moved into cluster 2, and no tracts moved into cluster 1. Depending on how your clusters are defined will help to explain how the meaning of these transitions. Note: the diagonal values indicate that tracts remained in same cluster grouping in 2000 and 2010.

On Tue, Dec 3, 2019 at 6:54 PM sunaynagoel notifications@github.com wrote:

I am a little lost at reading transition matrix. Here is a screen shot of my transition matrix.

[image: Screen Shot 2019-12-03 at 6 53 04 PM] http://url

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y7YGRMHIGLUT6K5J6I3QW4EUNA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF3OEYQ#issuecomment-561439330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y7YN65ZQZTRQRRZDQCDQW4EUNANCNFSM4JTOIFQQ .

-- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com

etbartell commented 4 years ago

I'm having trouble understanding the change variables conceptually. If we were going for percent change, we would just use (2010var-2000var)/2000var, but since we're using the formula of 2000var/(2010var+1), I don't understand what the values are telling us. With the exception of home price, the other variables are all decimals, and adding 1 to the denominator completely alters its value. For example, if ForeigBornChange = 0.095, this doesn't mean that the foreign-born population changed by 9.5%. It's just what the formula spit out. I feel like I'm missing something. Does anyone have a solid grasp of what these variables mean?

castower commented 4 years ago

I have a question concerning the dorling maps. In Lab 4 we were creating them based on household income, but I'm not sure what we're clustering here. Should we group these by the cluster variable or something else? I may be overlooking a step, but I can't quite figure out what I'm plotting.

Thanks!

AntJam-Howell commented 4 years ago

@etbartell Nice question here and nice catch. Actually, it is more intuitive to have the change variables defined as 2010var/2000var rather than in the .rmd file which has it as 2000var/2010var. With respect to adding a constant to a variable, in this case it would be better to add a small value to the variables. So for home values, adding a 1 makes sense. When working with proportions it makes more sense to add a .01 instead of 1. I will update these changes to the .rmd file.

sunaynagoel commented 4 years ago

@etbartell Nice question here and nice catch. Actually, it is more intuitive to have the change variables defined as 2010var/2000var rather than in the .rmd file which has it as 2000var/2010var. With respect to adding a constant to a variable, in this case it would be better to add a small value to the variables. So for home values, adding a 1 makes sense. When working with proportions it makes more sense to add a .01 instead of 1. I will update these changes to the .rmd file.

This make so much more sense now. Thank @etbartell for asking this question and @Anthony-Howell-PhD for the help.

AntJam-Howell commented 4 years ago

@castower Besides household income, we also used dorling to map clusters in Lab 4. see the attached screenshot from lab 4 instructions.

Screenshot 2019-12-03 19 50 10
castower commented 4 years ago

@Anthony-Howell-PhD Thank you! I have another question about the data tab of the flexdashboard. Should there be labels on the blue tabs? I can't figure out how to name them. Screen Shot 2019-12-03 at 7 41 11 PM

etbartell commented 4 years ago

@etbartell Nice question here and nice catch. Actually, it is more intuitive to have the change variables defined as 2010var/2000var rather than in the .rmd file which has it as 2000var/2010var. With respect to adding a constant to a variable, in this case it would be better to add a small value to the variables. So for home values, adding a 1 makes sense. When working with proportions it makes more sense to add a .01 instead of 1. I will update these changes to the .rmd file.

Thanks! That makes so much more sense now.

sunaynagoel commented 4 years ago

@Anthony-Howell-PhD Thank you! I have another question about the data tab of the flexdashboard. Should there be labels on the blue tabs? I can't figure out how to name them. Screen Shot 2019-12-03 at 7 41 11 PM

@castower I had the same issue. Shortening the title, and reducing items from menu bar helped. The label are the text after ###. Hope this helps.

castower commented 4 years ago

Thank you! That worked! @sunaynagoel

On Tue, Dec 3, 2019, 7:48 PM sunaynagoel notifications@github.com wrote:

@Anthony-Howell-PhD https://github.com/Anthony-Howell-PhD Thank you! I have another question about the data tab of the flexdashboard. Should there be labels on the blue tabs? I can't figure out how to name them. [image: Screen Shot 2019-12-03 at 7 41 11 PM] https://user-images.githubusercontent.com/54308186/70110866-4070c000-1605-11ea-886f-276ac75b89f5.png

@castower https://github.com/castower I had the same issue. Shortening the title, and reducing items from menu bar helped. The label are the text after ###. Hope this helps.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AM6K2WSGQHEHT7Z5RIW3RKTQW4R7LA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF3UHLY#issuecomment-561464239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM6K2WWCW32DT6OL53676P3QW4R7LANCNFSM4JTOIFQQ .

Jigarci3 commented 4 years ago

@Jigarci3 You do not have to use the get_acs function to download data for the final project. The code chunk (below) gives you the 2000 and 2010 census variables. You have the census.dats dataframe that includes the tract ('TRTID10'), state ('state') and county ('county') information already. You need to subset the census.dats to include only the Seattle counties of your interest.

I think I have a better understanding on subsetting the data- however, I am getting an error with my code.

library(dplyr)

colnames(census.dats) <- c("TRTID10", "state", "county", "Median.HH.Value00", "Foreign.Born00", "Recent.Immigrant00", "Poor.English00", "Veteran00", "Poverty00", "Poverty.Black00", "Poverty.White00", "Poverty.Hispanic00", "Pop.Black00", "Pop.Hispanic00", "Pop.Unemp00", "Pop.Manufact00", "Pop.SelfEmp00", "Pop.Prof00", "Female.LaborForce00", "Median.HH.Value10", "Foreign.Born10", "Recent.Immigrant10", "Poor.English10","Veteran10", "Poverty10", "Poverty.Black10", "Poverty.White10", "Poverty.Hispanic10", "Pop.Black10", "Pop.Hispanic10", "Pop.Unemp10", "Pop.Manufact10", "Pop.SelfEmp10", "Pop.Prof10", "Female.LaborForce10")

seattle.msa <- select(filter(census.dats, state== "WA" & county== "King County"| county== "Snohomish"| county == "Pierce County", select==c(TRTID10, state, county, Median.HH.Value00, Foreign.Born00, Recent.Immigrant00, Poor.English00, Veteran00, Poverty00, Poverty.Black00, Poverty.White00, Poverty.Hispanic00, Pop.Black00, Pop.Hispanic00, Pop.Unemp00, Pop.Manufact00, Pop.SelfEmp00, Pop.Prof00, Female.LaborForce00, Median.HH.Value10, Foreign.Born10, Recent.Immigrant10, Poor.English10,Veteran10, Poverty10, Poverty.Black10, Poverty.White10, Poverty.Hispanic10)))

I receive the following error: Error: Result must have length 71413, not 1999564.

Has anyone run into this and any idea what I am missing here?

Update: Finally figured it out!

katiegentry07 commented 4 years ago

I am working to keep all of the data for the Portland-Vancouver MSA and it looks like it is filtering some of the data out. Is there a way to solve this? If I run the individual counties, there are many more TRTID10 kept in the data set. I'm not sure how to avoid filtering out some of this data when I believe I am on the right track for keeping the MSA overall. My code is below.

portland.data <- filter(census.dats, 
     state == c("OR", "WA"), 
     county == c("Clackamas County",  "Columbia County", 
        "Multnomah County", "Washington County", 
        "Yamhill County", "Clark County") )
portland.data
lecy commented 4 years ago

In logical statements == works when you have a single criteria. It fails when you have multiple:

c("A","B","C") == "A"
TRUE FALSE FALSE

c("A","B","C") == "B"
FALSE  TRUE FALSE

c("A","B","C") == c("B","A")
FALSE FALSE FALSE

When using multiple criteria you can use the %in% operator:

c("A","B","C") %in% c("B","A")
TRUE  TRUE FALSE

I'm not sure if that is the fix, but a good reminder nonetheless!

meliapetersen commented 4 years ago

I'm having issues figuring out the Identifying Communities tab in the dashboard under Clustering . In the example, when I knit the document the tab shows up as blank, is there supposed to be a visualization of the data? In lab 6 it looks like it's just the clustering code, but in the notes it says it's a visualizing tool.

#Visualize Data
stats1 <- 
  Census2010 %>% 
  group_by( cluster ) %>% 
  select(keep.these1)%>% 
  summarise_each( funs(mean) )

t <- data.frame( t(stats1), stringsAsFactors=F )
names(t) <- paste0( "GROUP.", 1:3 )
t <- t[-1,]

I changed the code from 1:4 to 1:3 because that's how many groups were created when I clustered my data. I'm not quite sure if that is correct either.

lepp12 commented 4 years ago

In the Mapping Clusters section I have review all of Lab 4. I understand how to merge the spatial information to the Census2010 dataframe. What I'm not understanding is where I'm supposed to be getting the spatial information and what the by.x and by.y is for the merge?

AntJam-Howell commented 4 years ago

@meliapetersen you can remove the ### Identifying Communities. There is no output to show there.

AntJam-Howell commented 4 years ago

@lepp12 Do you see the following snippet code chunk from Lab 4. In the get_acs function, setting geometry=TRUE is where the spatial data comes from.

msp.pop2 <-
get_acs( geography = "tract", variables = "B01003_001",
     state = "55", county = county.fips[state.fips=="55"], geometry = TRUE )
AntJam-Howell commented 4 years ago

@lepp12 Do you recall the following snipped code chunk from Lab 4. by.x is referring to the name of the matching variable contained in the first placeholder for the dataset, in this case msp.pop. by.y refers to the name of the matching variable contained in the second placeholder for the dataset, in this case census.dat. In this case GEOID and tractid refer to the same thing, tract code, but are given different names in each of our two datasets.

msp <- merge( msp.pop, census.dat, by.x="GEOID", by.y="tractid" )
lepp12 commented 4 years ago

@lepp12 Do you see the following snippet code chunk from Lab 4. In the get_acs function, setting geometry=TRUE is where the spatial data comes from.

msp.pop2 <-
get_acs( geography = "tract", variables = "B01003_001",
     state = "55", county = county.fips[state.fips=="55"], geometry = TRUE )

@Anthony-Howell-PhD I do see those code snippets. I thought you had mentioned in a previous response the get_acs call wasn't needed. Thank you for the help!

AntJam-Howell commented 4 years ago

No worries. I think the earlier comment was with respect to just downloading the census floats rather than obtaining the spatial data. In that case get_acs not needed because I had already provided all of the census data needed. Hope that helps.

On Wed, Dec 4, 2019 at 1:53 PM lepp12 notifications@github.com wrote:

@lepp12 https://github.com/lepp12 Do you see the following snippet code chunk from Lab 4. In the get_acs function, setting geometry=TRUE is where the spatial data comes from.

msp.pop2 <- get_acs( geography = "tract", variables = "B01003_001", state = "55", county = county.fips[state.fips=="55"], geometry = TRUE )

@Anthony-Howell-PhD https://github.com/Anthony-Howell-PhD I do see those code snippets. I thought you had mentioned in a previous response the get_acs call wasn't needed. Thank you for the help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y7Z6OVMY3J7RKWQ46XLQXAKE7A5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6ORDQ#issuecomment-561834126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y75N4AECE5FKEWTFHA3QXAKE7ANCNFSM4JTOIFQQ .

-- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com

castower commented 4 years ago

Is there anyway to set ggplot to not cut off the titles of my labels on the histogram grid? The look fine in RMarkdown, but when I knit the file some of the title labels are cut off:

Screen Shot 2019-12-04 at 1 26 09 PM

AntJam-Howell commented 4 years ago

There is a way that it could be done. Could try to troubleshoot it on google search, but the easiest and perhaps more informative way is to change variable names either directly to the data or indirectly through ggplot. I googled change variable names in ggplot and the first option that pops up is the following link that may get you started (Link https://stackoverflow.com/questions/52656493/renaming-variable-names-in-a-ggplot2 )

On Wed, Dec 4, 2019 at 2:28 PM Courtney notifications@github.com wrote:

Is there anyway to set ggplot to not cut off the titles of my labels on the histogram grid? The look fine in RMarkdown, but when I knit the file some of the title labels are cut off:

[image: Screen Shot 2019-12-04 at 1 26 09 PM] https://user-images.githubusercontent.com/54308186/70183052-e8d06400-1699-11ea-9873-446dd91d26c0.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y7YX4L5MIFJBGA5MIK3QXAOHVA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6RUFA#issuecomment-561846804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y746FL45ZNT7PC6HZY3QXAOHVANCNFSM4JTOIFQQ .

-- Anthony Howell Asst. Prof. in Public Policy School of Public Affairs Arizona State University Faculty Profile https://isearch.asu.edu/profile/3501621 (CV https://www.dropbox.com/s/b1pxccpwxm6fats/Howell.CV.pdf?dl=0)

lepp12 commented 4 years ago

@Anthony-Howell-PhD I was able to get the spatial information and get it merged with census.dats. However, I get the error

Error in predict.Mclust(mod2, Census2000[keep.these00]) : newdata must match ncol of object data

when running the code below. When I re-download census.dats, the problem does not occur. Is there an issue with the way I merged?


Census2000 <-census.dats

keep.these00 <-c("Foreign.Born00","Recent.Immigrant00","Poor.English00","Veteran00","Poverty00","Poverty.Black00","Poverty.White00","Poverty.Hispanic00","Pop.Black00","Pop.Hispanic00","Pop.Unemp00","Pop.Manufact00","Pop.SelfEmp00","Pop.Prof00","Female.LaborForce00")

pred00<-predict(mod2, Census2000[keep.these00])

Census2000$PredCluster <- pred00$classification

TransDF2000<-Census2000 %>%
  select(TRTID10, PredCluster)

TransDF2010<-Census2010 %>%
  select(TRTID10, cluster,Median.HH.Value10) 

TransDFnew<-merge(TransDF2000,TransDF2010,by.all="TRTID10",all.x=TRUE)```
castower commented 4 years ago

There is a way that it could be done. Could try to troubleshoot it on google search, but the easiest and perhaps more informative way is to change variable names either directly to the data or indirectly through ggplot. I googled change variable names in ggplot and the first option that pops up is the following link that may get you started (Link https://stackoverflow.com/questions/52656493/renaming-variable-names-in-a-ggplot2 ) On Wed, Dec 4, 2019 at 2:28 PM Courtney @.***> wrote: Is there anyway to set ggplot to not cut off the titles of my labels on the histogram grid? The look fine in RMarkdown, but when I knit the file some of the title labels are cut off: [image: Screen Shot 2019-12-04 at 1 26 09 PM] https://user-images.githubusercontent.com/54308186/70183052-e8d06400-1699-11ea-9873-446dd91d26c0.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20?email_source=notifications&email_token=AMK2Y7YX4L5MIFJBGA5MIK3QXAOHVA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6RUFA#issuecomment-561846804>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y746FL45ZNT7PC6HZY3QXAOHVANCNFSM4JTOIFQQ . -- Anthony Howell Asst. Prof. in Public Policy School of Public Affairs Arizona State University Faculty Profile https://isearch.asu.edu/profile/3501621 (CV https://www.dropbox.com/s/b1pxccpwxm6fats/Howell.CV.pdf?dl=0)

Thank you!

One other question, I discovered that my data set has one massive outlier for the House Price change variable (there's an instance where in 2000 the median house price was only $300 and in 2010 it was $284,900). Should I exclude this outlier since it's skewing the data (especially the mean) or just mention it in my summary?

Thanks!

castower commented 4 years ago

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

castower commented 4 years ago

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

Also want to note, that if you want to leave the variables alone, you can use the fig.width setting for r-markdown to widen the figure.

sunaynagoel commented 4 years ago

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

Also want to note, that if you want to leave the variables alone, you can use the fig.width setting for r-markdown to widen the figure.

Thanks @castower