commfish / coho_known_age_study

1 stars 1 forks source link

test for normality function #9

Open fssem1 opened 5 years ago

fssem1 commented 5 years ago

test for normality

eda.norm <- function(x, ...) { par(mfrow=c(2,2)) if(sum(is.na(x)) > 0) warning("NA's were removed before plotting") x <- x[!is.na(x)] hist(x, main = "Histogram and non-\nparametric density estimate", prob = T) iqd <- summary(x)[5] - summary(x)[2] lines(density(x, width = 2 * iqd)) boxplot(x, main = "Boxplot", ...) qqnorm(x) qqline(x) plot.ecdf(x, main="Empirical and normal cdf") LIM <- par("usr") y <- seq(LIM[1],LIM[2],length=100) lines(y, pnorm(y, mean(x), sqrt(var(x)))) shapiro.test(x) }

fssem1 commented 5 years ago

Priest email from 5-13-2019 From running the Shapiro-Wilks test on the 8 permutations of the 2 rivers, 2 variables, and 2 ages: 4 are normal, 1 is marginally normal, and 3 are not normal. Summary table below:

Of these not normal variables, here's what they look like before transformation with a normal curve overlain (with its own mean, sigma):

Visually, these don't look like severe violations of normality as they aren't bimodal, the means are fairly close to where they "should" be, and the tails match fairly closely. I used Box Cox test on these for optimal transformations; the histograms look very similar with the same number of issues.

fssem1 commented 5 years ago

http://userwww.sfsu.edu/efc/classes/biol710/discrim/discrim.pdf

Good article about assumptions. It sounds like as long as the non-normality is caused by skewness and NOT outliers, we are fine. I think we need to get this reference though for our manuscript to support the fact that some of our variables are non-normal.

Tabachnick, B.G. and L.S. Fidell. 1996. Using Multivariate Statistics. Harper Collins College Publishers: New York.