Closed crtahlin closed 10 years ago
Implemented the Holm-Bonferroni correction and Q-value (?p.adjust methods "holm" and "BY") in 08bab4e2f33a193201f1f0d089a499ea27a4d5fe .
Researching how to do permutation tests. @llaarraa : should permutation tests be done with a certain number of simulations (resampling), or should they be done "exhaustively" (so that all possible permutations are done)? Do you know of a package that does the right kind of permutation tests?
Update: Will probably use http://cran.r-project.org/web/packages/permute/vignettes/permutations.pdf, now reading about it.
I have implemented the permutation tests (1000 random permutations) for both tables on Distribution...: by grouping variable in commit 4c034fb06b0971dc2459bb5bd4f4762bfe5966c3 .
@llaarraa : The permutation tests for medians give high P values (compared to column "P value"). But thinking about it - it makes sense, since mostly there are no differences between medians for the groups (most are 0), so if the P values are supposed to be for the difference of the medians, the calculated difference (of mostly 0) is in line with the null hypothesis of no difference in medians. Off course, looking into it, the wilcox.test() (whose P value in in the column "P value") does not actually compare the difference of the medians, but whether the samples are different (or something to that effect). So the question is: is the column "Permutation based P value " att all needed in the table for medians for both groups?
Trying it now: I am getting the error: could not find function "shuffleSet"
Does this library provide multivariate permutation testing or univariate?
I had in mind a multivariate permutation testing approach, which can be easily implemented for simple univariate tests like chi-squared or t-test. See the code below. The number of permutations might be 999
my.data=read.delim("C:/Users/lara/Dropbox/medplot/ForSymptoms/DataEM.txt", sep="\t")
setwd("C:/Users/lara/Dropbox/didatticaStat/PhDStatistics-Biostatistics(modul)/out")
my.data.t0=my.data[my.data$Measurement==0,]
which.symptoms=c(9:24)
data.symptoms=my.data.t0[, which.symptoms]
data.symptoms.yn=ifelse(my.data.t0[, which.symptoms]>0, 1, 0)
num.symptoms=num.var=ncol(data.symptoms.yn)
num.samples=nrow(data.symptoms.yn)
which.var=4 #4: sex, 6: culture
my.var=my.data.t0[,which.var]
OR=chi.p=numeric(num.symptoms)
for(i in 1:num.symptoms){
my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p[i]=chisq.test(my.table)$p.value
}
chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)
#permutation p-value
set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)
for(b in 1:B){
for(i in 1:num.symptoms){
my.var.perm=sample(my.var)
my.table=table(data.symptoms.yn[,i]==1, my.var.perm=="Male")
#OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p.perm[i]=chisq.test(my.table)$p.value
}
chi.p.min[b]=min(chi.p.perm)
}#end for b
chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )
chi.p.holm=rep(NA, num.var)
my.order=order(chi.p)
write.table(cbind(names(my.data.t0[, which.symptoms]), chi.p, chi.p.bonf, chi.p.holm, chi.p.adj.perm)[my.order,], file="ResSexMT.txt", sep="\t")
####################### analysis for response
which.var=8 #4: sex, 6: culture
my.var=my.data.t0[,which.var]
OR=chi.p=numeric(num.symptoms)
for(i in 1:num.symptoms){
my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p[i]=chisq.test(my.table)$p.value
}
chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)
#permutation p-value
set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)
for(b in 1:B){
for(i in 1:num.symptoms){
my.var.perm=sample(my.var)
my.table=table(data.symptoms.yn[,i]==1, my.var.perm==levels(my.var)[2])
#OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p.perm[i]=chisq.test(my.table)$p.value
}
chi.p.min[b]=min(chi.p.perm)
}#end for b
chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )
chi.p.holm=numeric(num.var)
my.order=order(chi.p)
chi.p.holm[my.order]=chi.p[my.order]*c(num.var:1)
chi.p.holm=ifelse(chi.p.holm>1, 1, chi.p.holm)
for(i in 1:(num.var-1))
chi.p.holm[my.order][i+1]=ifelse(chi.p.holm[my.order][i+1]>chi.p.holm[my.order][i], chi.p.holm[my.order][i+1], chi.p.holm[my.order][i])
chi.p.fdr=numeric(num.var)
chi.p.fdr[my.order]=chi.p[my.order] * num.var / (1:num.var)
for(i in 1:(num.var-1))
chi.p.fdr[my.order][i+1]=ifelse(chi.p.fdr[my.order][i+1]>chi.p.fdr[my.order][i], chi.p.fdr[my.order][i+1], chi.p.fdr[my.order][i])
chi.p.fdr[chi.p.fdr>1]=1
ifelse(chi.p.fdr>1, 1, chi.p.fdr)
chi.p.fdr[my.order]
write.table(cbind(names(my.data.t0[, which.symptoms]), chi.p, chi.p.bonf, chi.p.holm, chi.p.adj.perm, chi.p.fdr)[my.order,], file="ResRespMT.txt", sep="\t")
##################### age - t-test
####################### analysis for response
which.var=5 #4: sex, 6: culture
my.var=my.data.t0[,which.var]
OR=t.p=numeric(num.symptoms)
chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)
for(i in 1:num.symptoms){
#my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
#OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p[i]=t.test(my.var ~data.symptoms.yn[,i])$p.value
}
#permutation p-value
set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)
for(b in 1:B){
for(i in 1:num.symptoms){
my.var.y=sample(data.symptoms.yn[,i])
#my.table=table(data.symptoms.yn[,i]==1, my.var.perm=="Male")
#OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
chi.p.perm[i]=t.test(my.var ~my.var.y)$p.value
}
chi.p.min[b]=min(chi.p.perm)
}#end for b
chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )
You have to have the library(permute)
for the permutation tests (have added it to DESCRIPTION in ebb4c5eb8e8c0dc3c912ca72dcb4c143e96dd78d.
I thought the permutation tests were for computing a P value non-parametricaly, now I see you probably meant some kind of permutation test for correcting the P value for multiple comparisons. Actualy, now thinking about it, maybe it does support multivariate test... Will have to read your code and some theory and will get back to you.
Done permutation based P value correction for proportions in a4f8eb82a92ad616dd79c90183c8473d2c0c2a92.
@llaarraa : Still haven't implemented permutation based P value correction for medians, since I do not completely understand what the P values there represent. (See also my comment https://github.com/crtahlin/medplot/issues/61#issuecomment-40603769 ). Since the wilcox.test()
does not compare the difference in medians, but whether or not the samples are different (as far as I understand from ?wilcox.test()
, either:
wilcox.test()
is not appropriate (in that case the values in the table are wrong)wilcox.test
is appropriate, I guess i could do the permutation test analogously as for the proportions?For the Mann-Whitney test the null hypothesis is that the distributions are the same in the two populations. I would simply use this test and correct the table description.
Implemented multivariate permutation tests also for the "medians" table in commit 7d1c30830273b6eadac465e23c36dd5d65abd160 . All implemented, closing.
To both tables add columns with the modified P value:
Later do also: