Open sunaynagoel opened 4 years ago
@lepp12 the problem is that you've subsetted the data to only 2000 variables, run the prediction, and then you are trying to predict new data with the full dataset that includes both the 2000 and 2010 variables. I suggest instead of the following Census2000 <-census.dats
you may want to create 2 separate datasets, 1 for 2010 and 1 for 2000, and make sure that the same variables are included in both.
@castower There are several options, all of which are quite common, and you are free to choose what you think is best: windsorize the variable, trim the variable, remove the outlier outright, or take the log of the variable.
@castower There are several options, all of which are quite common, and you are free to choose what you think is best: windsorize the variable, trim the variable, remove the outlier outright, or take the log of the variable.
@Anthony-Howell-PhD thank you!
@meliapetersen you can remove the ### Identifying Communities. There is no output to show there.
@Anthony-Howell-PhD It looks like below there is a place to "interpret results" before the cluster analysis. If there is no output to show, are there results to interpret?
The main point of that section is to define and label each of the cluster groupings, which can be done on the side panel of each cluster output figure. You are free to keep in the ### identifying communities and add some basic description of what you did, i.e. perform cluster analysis, but there is no visualization for now. You can add your own visualization if you want though.
@meliapetersen see above reply
Hi, I'm having trouble identifying the variables needed for the dorling cartograms. Can someone help me make a little but more sense of it in regards to how to merge the spatial information to the Census2010 data frame. I keep going back to Lab 4 and what we've coded thus far and I'm not fully understanding what I'm supposed to do. Thank you! :)
@meliapetersen I would suggest getting the spatial data information using get_acs for your state. Once you have that, you will want to merge your census dataframe to the spatial dataframe.
SpatialData <-
get_acs( geography = "tract", variables = "B01003_001", state = "??", geometry = TRUE ) %>%
select( GEOID, estimate )
SpatialData<-merge(SpatialData, CensusDataframeNAME,all.x='GEOID',all.y='TRTID10')
That should get you started
@meliapetersen If helpful, the course GitHub site has information on how the dorling cartograms were built for labs 3 and 4. That code might be instructive:
https://github.com/DS4PS/cpp-529-master/blob/master/data/README.md
@meliapetersen I would suggest getting the spatial data information using get_acs for your state. Once you have that, you will want to merge your census dataframe to the spatial dataframe.
SpatialData <- get_acs( geography = "tract", variables = "B01003_001", state = "??", geometry = TRUE ) %>% select( GEOID, estimate ) SpatialData<-merge(SpatialData, CensusDataframeNAME,all.x='GEOID',all.y='TRTID10')
That should get you started
Awesome, thank you! I think I figured it out, but I'm getting an error when I run my code:
census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)
seattle.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "53", geometry = TRUE ) %>%
select( GEOID, estimate )
seattle.pop$GEOID<-substring(seattle.pop$GEOID, 2)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )
seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]
seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
seattle_dorling <- seattle_dorling( x=seattle.sp, weight="pop.w", k=0.05 )
tm_shape( seattle_dorling ) +
tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")
Error:
Error in st_cast_sfc_default(x) : list item(s) not of class sfg
I think you are missing an argument here:
substring( seattle.pop$GEOID, 2 )
You usually have a starting position and ending position for the substring() function.
What does the Census2010
tract ID TRTID10
look like? Is it state, county, and tract FIP IDs, or just one of them?
SS-CCC-TTTTTT
I am having trouble with the same area (creating dorling):
census_api_key("42bf5fcc6e6a6f05ebe97a0e647a5216a708613a")
aus.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "48", county = county.fips[state.fips=="48"], geometry = TRUE ) %>%
select( GEOID, estimate )
aus <- merge( aus.pop, austin.data, by.x="GEOID", by.y="TRTID10" )
aus.sp <- as_Spatial( aus )
class( aus.sp )
aus.sp <- spTransform( aus.sp, CRS("+init=epsg:3395"))
aus.sp <- aus.sp[ aus.sp$POP != 0 & (! is.na( aus.sp$POP )) , ]
aus_dorling <- cartogram_dorling( x=aus.sp, weight="pop.w", k=0.05 )
I get the error: Error in packcircles::circleRepelLayout(x = dat.init, xysizecols = 1:3, : all sizes are missing and/or non-positive
@lecy So I noticed that I didn't have the county = county.fips[state.fips=="53"]
argument in my seattle.pop code, but I don't know if that's what you were talking about because that didn't fix the issue. I'm still getting the same argument.
code here:
census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)
seattle.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
select( GEOID, estimate )
seattle.pop$GEOID<-substring(seattle.pop$GEOID, 2)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )
seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]
seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
seattle_dorling <- seattle_dorling( x=seattle.sp, weight="pop.w", k=0.05 )
tm_shape( seattle_dorling ) +
tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")
I'm not quite sure I understand what I need to add to the substring argument.
@RickyDuran Do you have a POP variable? I might have renamed it from the default census name.
Did you create the weighted pop variable?
phx$pop.w <- phx$POP / 10000 # standardizes it to max of 1.5
You might have to adjust the denominator depending on the max population. If you also have an NA or a 0 for population it could create the error you are getting. You should drop those polygons (filter them out) before the conversion.
summary( phx$POP )
@meliapetersen The get_acs()
function requires both state and county fips. Here is how they are being generated from the original GEOID (which is state-county-tract FIPS combined):
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
these.msp <- crosswalk$msaname == "MINNEAPOLIS-ST. PAUL, MN-WI"
these.fips <- crosswalk$fipscounty[ these.msp ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
data.frame( these.fips, state.fips, county.fips ) %>% pander()
these.fips | state.fips | county.fips | |
---|---|---|---|
1 | 27003 | 27 | 003 |
2 | 27019 | 27 | 019 |
3 | 27025 | 27 | 025 |
4 | 27037 | 27 | 037 |
5 | 27053 | 27 | 053 |
6 | 27059 | 27 | 059 |
7 | 27123 | 27 | 123 |
8 | 27139 | 27 | 139 |
9 | 27141 | 27 | 141 |
10 | 27163 | 27 | 163 |
11 | 27171 | 27 | 171 |
12 | 55093 | 55 | 093 |
13 | 55109 | 55 | 109 |
If your MSA spans two states then you need to split these codes into two separate calls to get_acs(), I believe. One for each state using the corresponding county FIPS. Recall that county FIPS are not unique since each state will have a 001, 002, etc.
> county.fips
[1] "003" "019" "025" "037" "053" "059" "123" "139" "141" "163" "171" "093"
[13] "109"
> county.fips[ state.fips=="55" ]
[1] "093" "109"
Substring is pulling out each FIPS.
@lecy So I fixed the issue I was having, and now it's giving me a different error. I added:
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
name.fips <- crosswalk$countyname[these.seattle]
census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)
And now it is giving me an issue with this line of code:
seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )
Giving me this error: [1] "SpatialPolygonsDataFrame" attr(,"package") [1] "sp" Error in x@polygons[[1]] : subscript out of bounds
Here is both code chunks now:
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
name.fips <- crosswalk$countyname[these.seattle]
census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)
seattle.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
select( GEOID, estimate )
seattle.pop$GEOID<-substring(seattle.pop$GEOID, 1)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )
seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]
seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )
tm_shape( seattle_dorling ) +
tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")
I think you are missing an argument here:
substring( seattle.pop$GEOID, 2 )
You usually have a starting position and ending position for the substring() function.
What does the
Census2010
tract IDTRTID10
look like? Is it state, county, and tract FIP IDs, or just one of them?SS-CCC-TTTTTT
I am having the same issue that Melia is having, but adding the code she added to fix this issue did not work for me.
My code looks like this:
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
name.fips <- crosswalk$countyname[these.chi]
census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "06", county = county.fips[state.fips=="17"], geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
dplyr::rename(POP = estimate)
# merge shapefile dT with census data in new datframe
chi.pop$GEOID<-substring(chi.pop$GEOID, 2)
chi <- merge( chi.pop, census.dats, by.x="GEOID", by.y="TRTID10" )
chi2 <- chi[! st_is_empty(chi), ]
chi.sp <- as_Spatial( chi2 )
class( chi.sp )
# project map and remove empty tracts
chi.sp <- spTransform( chi.sp, CRS("+init=epsg:3395"))
chi.sp <- chi.sp[ chi.sp$POP != 0 & (! is.na( chi.sp$POP )) , ]
# convert census tract polygons to dorling cartogram
chi.sp$pop.w <- chi.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
chi_dorling <- cartogram_dorling( x=chi.sp, weight="pop.w", k=0.05 )
tm_shape( chi_dorling ) +
tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")
plot(chi.sp)
And is returning the error: "Error in st_cast_sfc_default(x) : list item(s) not of class sfg"
I do not understand what you mean about the substring argument not being complete. Am i supposed to add the FIPS within that argument?
@lecy
I believe I have pop data, I am doing the exact same thing for downloading shapefiles with popuoation data as in lab 4, although when I leave in "rename( POP=estimate )" in
aus.pop <-
get_acs( geography = "tract",
variables = "B01003_001",
state = "48",
county = county.fips[state.fips=="48"],
geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
rename( POP=estimate )
I get ERROR in rename(POP = estimate) : (POP=estimate) not used
When I take it out, I also get Error in x@polygons[[1]] : subscript out of bounds at:
aus.sp$pop.w <- aus.sp$POP / 9000
Hi everyone,
As many of you are experiencing difficulties trying to recreate a dorling map using a new dataset, I’ve decided to make the dorling map optional. Part of the problem is that each chosen msa often times have unique problems that require unique troubleshooting that can cost quite a bit of time. Given the fast approaching deadline and the other requirements of the lab it’s important to push through and get the rest of the parts completed.
For any of you who have successfully created the dorling map for the final project a bonus of 10% will be applied to your final project. For those of you that opt to not creat the dorling map there is no penalty.
Hope that helps relieve the workload stress.
Best
On Thu, Dec 5, 2019 at 5:01 PM Ricky Duran notifications@github.com wrote:
@lecy https://github.com/lecy
I believe I have pop data, I am doing the exact same thing for downloading shapefiles with popuoation data as in lab 4, although when I leave in "rename( POP=estimate )" in
'aus.pop <- get_acs( geography = "tract", variables = "B01003_001", state = "48", county = county.fips[state.fips=="48"], geometry = TRUE ) %>% select( GEOID, estimate ) %>% rename( POP=estimate )' I get ERROR in rename(POP = estimate) : (POP=estimate) not used
When I take it out, I also get Error in x@polygons[[1]] : subscript out of bounds at: aus.sp$pop.w <- aus.sp$POP / 9000
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y77BMURLL4HXOJDGXM3QXGI45A5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCR7PQ#issuecomment-562372542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y72VM426H6IOPVQ2PEDQXGI45ANCNFSM4JTOIFQQ .
-- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com
@Anthony-Howell-PhD I just finished recording my video and it came out to right at 24 mins and 53 seconds. Is this okay?
Absolutely!
On Thu, Dec 5, 2019 at 6:28 PM Courtney notifications@github.com wrote:
@Anthony-Howell-PhD https://github.com/Anthony-Howell-PhD I just finished recording my video and it came out to right at 24 mins and 53 seconds. Is this okay?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-master/issues/20?email_source=notifications&email_token=AMK2Y7YIW2B5ZGNJL7RUGWDQXGTDDA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCWZSI#issuecomment-562392265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y745WSTF73CQ4XS7KA3QXGTDDANCNFSM4JTOIFQQ .
-- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com
@meliapetersen Not sure about your current error, this seems to work (but I don't know what data you were joining with the Census2010 object because you did not include that code):
library( sp ) # work with shapefiles
library( sf ) # work with shapefiles - simple features format
library( dplyr ) # data wrangling
library( tidycensus )
library( cartogram ) # spatial maps w/ tract size bias reduction
library( maptools ) # spatial object manipulation
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options( tigris_use_cache = TRUE )
# only have one state, so can use county fips directly
seattle.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "53", county = c("029", "033", "061"), geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
rename( POP=estimate )
# I am not sure how you create Census2010 here
# seattle.pop$GEOID <- substring( seattle.pop$GEOID, 1 )
# seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
seattle <- seattle.pop
class( seattle.pop ) # sf
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp ) # sp
seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
nrow( seattle.sp ) # 569
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]
nrow( seattle.sp ) # 567
seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
summary( seattle.sp$pop.w )
seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )
plot( seattle_dorling, col="red" )
Absolutely! On Thu, Dec 5, 2019 at 6:28 PM Courtney @.***> wrote: @Anthony-Howell-PhD https://github.com/Anthony-Howell-PhD I just finished recording my video and it came out to right at 24 mins and 53 seconds. Is this okay? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20?email_source=notifications&email_token=AMK2Y7YIW2B5ZGNJL7RUGWDQXGTDDA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCWZSI#issuecomment-562392265>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y745WSTF73CQ4XS7KA3QXGTDDANCNFSM4JTOIFQQ . -- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com
@Anthony-Howell-PhD thank you!
@JaesaR Where does "06" come from?
these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "06", county = county.fips, geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
dplyr::rename(POP = estimate)
data.frame( state.fips, county.fips )
state.fips | county.fips |
---|---|
17 | 031 |
17 | 037 |
17 | 043 |
17 | 063 |
17 | 089 |
17 | 093 |
17 | 097 |
17 | 111 |
17 | 197 |
@JaesaR this works, note the commented out code:
these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
data.frame( state.fips, county.fips )
census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "17", county = county.fips, geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
dplyr::rename(POP = estimate)
# chi.pop$GEOID<-substring(chi.pop$GEOID, 2)
# chi <- merge( chi.pop, census.dats, by.x="GEOID", by.y="TRTID10" )
chi <- chi.pop
chi2 <- chi[! st_is_empty(chi), ]
chi.sp <- as_Spatial( chi2 )
class( chi.sp )
# project map and remove empty tracts
chi.sp <- spTransform( chi.sp, CRS("+init=epsg:3395"))
chi.sp <- chi.sp[ chi.sp$POP != 0 & (! is.na( chi.sp$POP )) , ]
# convert census tract polygons to dorling cartogram
chi.sp$pop.w <- chi.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
chi_dorling <- cartogram_dorling( x=chi.sp, weight="pop.w", k=0.05 )
plot( chi_dorling, col="steelblue" )
@RickyDuran you did not have enough code for a reproducible example. So I'm not sure how the error is introduced, but what you have looks fine.
@Anthony-Howell-PhD I am running into following error while knitting the .rmd document.
Quitting from lines 56-78 (Final_Project_Outline_Storyboard-Goel.Rmd) Error in loadNamespace(name) : there is no package called 'lorem' Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
When I tried to install the package "lorem", following error was produced.
Anyone else running into this issue?