Open aramcb opened 6 years ago
You can easily extract correlation coefficients and place them in a n x n matrix (where n is the number of variables of interest) using package corrplot
In corrplot
there is a function corrplot
that you can plot a cor()
output. You need to have your data in wide format with each column corresponding to a variable of interest.
example: corrplot( cor(df[1:3,]), method = "number")
However, these are Pearson's correlations and require linear relationships.
(Check out the vignette here)
If you want to look into plotting each variable against every other one, there are ways to do this in base plot, lattice
and ggplot
Theres a good chapter on a data exploration in Analyzing Ecological Data by Zuur et al., 2007 (available as an eBook via UBC library) that outlines various visualization methods for digging into your data.
@dtavern ah what a great package! thx!
I have a dataframe (df) that looks like below (dput at bottom of post). There are 3 strains of animals (N2, YT17, KP4) with 6 response variables (e.g., probability, duration, speed, etc) and a corresponding score (percent_diff) for each of the variables. I would like to see if across animal strains are the scores (percent_diff) is correlated with each other? So for example, does a high probability score (percent_diff) correlated with a high duration score (percent_diff)?
Is there a quick way to draw a scatterplot correlating each variable's score with every other variable? So for instance, a scatterplot where the x-value is (percent_diff) for duration and the y-value is percent_diff for speed?
I am aware I can spread the variable scores but that does not quickly solve the correlation issue.
Let me know if you have any tips for this! Thank you,
df <- structure(list(strain = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("N2", "3-days-old", "VG88_1s", "YT17", "KP4", "VC1052", "DA1371", "CB120", "EK228", "FX05775", "KG518", "KG744", "KP1182", "lid_off", "MH24301", "PY1589", "RB1256", "RB824", "RM2710", "TM3577", "VC1052_cntm", "VC117", "VC20144", "VC228", "VM487"), class = "factor"), variable = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("prob", "dura", "spd", "mag", "dist", "alrev" ), class = "factor"), percent_diff = c(86.9321198291465, 76.4846917917094, 88.7306176350036, 57.5624885696056, 67.5176298547265, 85.2178945914899, 92.4254628567501, 76.2359628573487, 96.6405841321961, 70.2954995722212, 74.331748324351, 80.7151938970121, 63.5297840817911, 64.7310412896858, 90.1554309717398, 41.383659140458, 59.6974911225825, 91.6167664670659 )), .Names = c("strain", "variable", "percent_diff"), row.names = c(NA, -18L), class = "data.frame")