This is a collection of countywide data from the US Census and the Yale Center for Climate Change Communication.
Correlation Code #1

Open braedenmc1 opened 5 years ago

braedenmc1 commented 5 years ago

Dr. Soltoff,

I have added the specific code I wanted to use for my correlations. I found a tutorial online with a package similar to ggplot. I included the code and error.

This is my first time working with Git, so I don't think this showed up in the earlier code.

corr <- finalform %>% distinct(ruralscore, urbanscore, estimate.x, estimate.y, estimate, trump_per, human, harmUS, supportRPS)

data(corr) associations <- round(cor(corr), 3) head(associations[, 1:6])

p.mat <- cor_pmat(associations) head(p.mat[, 1:4])


Error in cor(corr) : 'x' must be numeric

bensoltoff commented 5 years ago

I ran the code from Clean_Census.Rmd. It looks like corr contains rows with missing values, so cor() cannot calculate correlation coefficients when a variable is missing values. Additionally, corr is a simple-features data frame which contains a geometry column. You need to remove that column from the data frame, then remove the observations with missing values. Only then can you estimate the correlation matrix. Try this code instead:

# convert to plain tibble data frame
associations <- as_tibble(corr) %>%
  # remove geometry column
  select(-geometry) %>%
  # drop rows with missing values
  drop_na() %>%
  # estimate correlation matrix
  cor() %>%

bensoltoff commented 5 years ago

Reproducible example

# get census data
census_white <- get_acs(geography = "county", 
                        variables = c(pop_white = "DP05_0037PE"), 
                        year = 2017)
census_income <- get_acs(geography = "county", 
                         variables = c(income = "B19013_001"), 
                         year = 2017)
census_grads <- get_acs(geography = "county", 
                        variables = c(grads = "DP02_0064E"), 
                        year = 2017,
                        geo = TRUE)
# combine data frames
transform1 <- left_join(yale, pop, by = c("GeoName" = "county_name"))
transform2 <- left_join(transform1, geo, by = c("GeoName" = "NAME"))
transform3 <- left_join(transform2, trump, by = c("GeoName" = "elect_county"))
transform4 <- left_join(census_white, transform3, by = c("NAME" = "GeoName"))
transform5 <- left_join(census_income, transform4, by = c("NAME" = "NAME"))
transform6 <- left_join(transform5, blsdata, by = c("NAME" = "bls_county"))

geolabs <- get_acs(geography = "county", 
                   variables = c(totalpops = "B00001_001E"), 
                   year = 2017,
                   geometry = TRUE)
finalform <- left_join(geolabs, transform6, by = c("NAME" = "NAME"))

# generate correlation coefficients
# keep only required columns and remove duplicate observations
corr <- finalform %>%
  distinct(ruralscore, urbanscore, estimate.x, estimate.y, estimate,
           trump_per, human, harmUS, supportRPS)

# need to convert to plain tibble data frame in order
# to drop the geometry column
associations <- as_tibble(corr) %>%
  # remove geometry column
  select(-geometry) %>%
  # drop rows with missing values
  drop_na() %>%
  # estimate correlation matrix
  cor() %>%
#>            ruralscore urbanscore estimate.x estimate.y estimate trump_per
#> ruralscore      1.000     -1.000     -0.369      0.198   -0.398     0.067
#> urbanscore     -1.000      1.000      0.369     -0.198    0.398    -0.067
#> estimate.x     -0.369      0.369      1.000      0.143    0.263    -0.077
#> estimate.y      0.198     -0.198      0.143      1.000   -0.174     0.134
#> estimate       -0.398      0.398      0.263     -0.174    1.000    -0.082
#> trump_per       0.067     -0.067     -0.077      0.134   -0.082     1.000
#> human          -0.440      0.440      0.306     -0.367    0.375    -0.162
#> harmUS         -0.321      0.321      0.188     -0.508    0.326    -0.177
#> supportRPS     -0.510      0.510      0.286     -0.465    0.367    -0.192
#>             human harmUS supportRPS
#> ruralscore -0.440 -0.321     -0.510
#> urbanscore  0.440  0.321      0.510
#> estimate.x  0.306  0.188      0.286
#> estimate.y -0.367 -0.508     -0.465
#> estimate    0.375  0.326      0.367
#> trump_per  -0.162 -0.177     -0.192
#> human       1.000  0.909      0.924
#> harmUS      0.909  1.000      0.807
#> supportRPS  0.924  0.807      1.000

# draw the correlation plot

Created on 2019-07-17 by the reprex package (v0.3.0)

