DS4PS / cpp-528-spr-2021

https://ds4ps.org/cpp-528-spr-2021/
0 stars 0 forks source link

Lab 04 - correlation matrix error #31

Open JasonSills opened 3 years ago

JasonSills commented 3 years ago

Hi,

I'm receiving this error with any variables I've tried in the correlation matrix:

Error in pairs.default(d4, upper.panel = panel.cor, lower.panel = panel.smooth) : non-numeric argument to 'pairs'

I selected the data with:

d <- select( d, tractid, mhv.growth, mhmval00, mhmval12, hinc00, 
             hu00, own00, rent00,  
             empclf00, clf00, unemp00, prof00,  
             dpov00, npov00,
             ag25up00, hs00, col00, 
             pop00.x, nhwht00, nhblk00, hisp00, asian00,
             cbsa, cbsaname )
d <- 
  d %>%
  mutate( p.white = 100 * nhwht00 / pop00.x,
          p.black = 100 * nhblk00 / pop00.x,
          p.hisp = 100 * hisp00 / pop00.x, 
          p.asian = 100 * asian00 / pop00.x,
          p.hs = 100 * hs00 / ag25up00,
          p.col = 100 * col00 / ag25up00,
          p.prof = 100 * prof00 / empclf00,
          p.unemp = 100 * unemp00 / clf00,
          pov.rate = 100 * npov00 / dpov00,
          p.rent = 100 * rent00/hu00,
          p.own = 100 * own00/hu00)

And used this to create the correlation plots:

# create subset to visualize in correlation matrix 
d3 <- select( d, p.white, p.rent )

# reduce data density for visualization
set.seed( 1234 )
d4 <- sample_n( d3, 10000 ) %>% na.omit()

# correlation plots
pairs( d4, upper.panel=panel.cor, lower.panel=panel.smooth )

However, no matter what data I choose for d3, I receive the same error.

Thanks,

cenuno commented 3 years ago

Hi Jason,

From the error, we see that d4 has a value in its columns that is not a number. This isn't exactly an na value; it could be an infinite number which happens when you tell R to divide by zero (i.e. 1 / 0 = Inf).

I'm away from my laptop at the moment but my first instinct would be to drop any infinite records from d3.

Cheers, Cristian

JasonSills commented 3 years ago

Thanks @cenuno, that worked.

lecy commented 3 years ago

Note that the argument use="everything" allows you to drop Inf and NaN values.

use

an optional character string giving a method for 
computing covariances in the presence of missing 
values. This must be (an abbreviation of) one of the 
strings "everything", "all.obs", "complete.obs", 
"na.or.complete", or "pairwise.complete.obs".