Distr. of vars by group. - add P values containing multiple testing corrections

crtahlin commented 10 years ago

To both tables add columns with the modified P value:

Bonferonni/Hall correction (the smallest P values * number of symptoms; the second smallest P value * number of symptoms-1; etc;) Sort symptoms by smallest P value.

Later do also:

Q-value (controls false discovery rate)
permutation tests P value (only if the user clicks a button, since they are time consuming)

crtahlin commented 10 years ago

Implemented the Holm-Bonferroni correction and Q-value (?p.adjust methods "holm" and "BY") in 08bab4e2f33a193201f1f0d089a499ea27a4d5fe .

Researching how to do permutation tests. @llaarraa : should permutation tests be done with a certain number of simulations (resampling), or should they be done "exhaustively" (so that all possible permutations are done)? Do you know of a package that does the right kind of permutation tests?

crtahlin commented 10 years ago

Update: Will probably use http://cran.r-project.org/web/packages/permute/vignettes/permutations.pdf, now reading about it.

crtahlin commented 10 years ago

I have implemented the permutation tests (1000 random permutations) for both tables on Distribution...: by grouping variable in commit 4c034fb06b0971dc2459bb5bd4f4762bfe5966c3 .

@llaarraa : The permutation tests for medians give high P values (compared to column "P value"). But thinking about it - it makes sense, since mostly there are no differences between medians for the groups (most are 0), so if the P values are supposed to be for the difference of the medians, the calculated difference (of mostly 0) is in line with the null hypothesis of no difference in medians. Off course, looking into it, the wilcox.test() (whose P value in in the column "P value") does not actually compare the difference of the medians, but whether the samples are different (or something to that effect). So the question is: is the column "Permutation based P value " att all needed in the table for medians for both groups?

llaarraa commented 10 years ago

Trying it now: I am getting the error: could not find function "shuffleSet"

llaarraa commented 10 years ago

Does this library provide multivariate permutation testing or univariate?

I had in mind a multivariate permutation testing approach, which can be easily implemented for simple univariate tests like chi-squared or t-test. See the code below. The number of permutations might be 999

my.data=read.delim("C:/Users/lara/Dropbox/medplot/ForSymptoms/DataEM.txt", sep="\t")

setwd("C:/Users/lara/Dropbox/didatticaStat/PhDStatistics-Biostatistics(modul)/out")

my.data.t0=my.data[my.data$Measurement==0,]

which.symptoms=c(9:24)
data.symptoms=my.data.t0[, which.symptoms]
data.symptoms.yn=ifelse(my.data.t0[, which.symptoms]>0, 1, 0)

num.symptoms=num.var=ncol(data.symptoms.yn)
num.samples=nrow(data.symptoms.yn)

which.var=4 #4: sex, 6: culture
my.var=my.data.t0[,which.var]

OR=chi.p=numeric(num.symptoms)

for(i in 1:num.symptoms){

    my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
    OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p[i]=chisq.test(my.table)$p.value   

    }

chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)

#permutation p-value    

set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)

for(b in 1:B){

for(i in 1:num.symptoms){

    my.var.perm=sample(my.var)
    my.table=table(data.symptoms.yn[,i]==1, my.var.perm=="Male")
    #OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p.perm[i]=chisq.test(my.table)$p.value  

    }   

    chi.p.min[b]=min(chi.p.perm)

    }#end for b

chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )

chi.p.holm=rep(NA, num.var) 

my.order=order(chi.p)   

write.table(cbind(names(my.data.t0[, which.symptoms]), chi.p, chi.p.bonf, chi.p.holm, chi.p.adj.perm)[my.order,], file="ResSexMT.txt", sep="\t")    

####################### analysis for response   

which.var=8 #4: sex, 6: culture
my.var=my.data.t0[,which.var]

OR=chi.p=numeric(num.symptoms)

for(i in 1:num.symptoms){

    my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
    OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p[i]=chisq.test(my.table)$p.value   

    }

chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)

#permutation p-value    

set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)

for(b in 1:B){

for(i in 1:num.symptoms){

    my.var.perm=sample(my.var)
    my.table=table(data.symptoms.yn[,i]==1, my.var.perm==levels(my.var)[2])
    #OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p.perm[i]=chisq.test(my.table)$p.value  

    }   

    chi.p.min[b]=min(chi.p.perm)

    }#end for b

chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )

chi.p.holm=numeric(num.var)
my.order=order(chi.p)   

chi.p.holm[my.order]=chi.p[my.order]*c(num.var:1)
chi.p.holm=ifelse(chi.p.holm>1, 1, chi.p.holm)

for(i in 1:(num.var-1))
    chi.p.holm[my.order][i+1]=ifelse(chi.p.holm[my.order][i+1]>chi.p.holm[my.order][i], chi.p.holm[my.order][i+1], chi.p.holm[my.order][i])

chi.p.fdr=numeric(num.var)
chi.p.fdr[my.order]=chi.p[my.order] * num.var / (1:num.var) 

for(i in 1:(num.var-1))
    chi.p.fdr[my.order][i+1]=ifelse(chi.p.fdr[my.order][i+1]>chi.p.fdr[my.order][i], chi.p.fdr[my.order][i+1], chi.p.fdr[my.order][i])

        chi.p.fdr[chi.p.fdr>1]=1

ifelse(chi.p.fdr>1, 1, chi.p.fdr)
chi.p.fdr[my.order]

write.table(cbind(names(my.data.t0[, which.symptoms]), chi.p, chi.p.bonf, chi.p.holm, chi.p.adj.perm, chi.p.fdr)[my.order,], file="ResRespMT.txt", sep="\t")    

##################### age - t-test

####################### analysis for response   

which.var=5 #4: sex, 6: culture
my.var=my.data.t0[,which.var]

OR=t.p=numeric(num.symptoms)

chi.p.bonf=chi.p*num.var
chi.p.bonf=ifelse(chi.p.bonf>1, 1, chi.p.bonf)

for(i in 1:num.symptoms){

    #my.table=table(data.symptoms.yn[,i]==1, my.var==levels(my.var)[2])
    #OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p[i]=t.test(my.var ~data.symptoms.yn[,i])$p.value   

    }

#permutation p-value    

set.seed(1234)
B=999
chi.p.min=numeric(B)
chi.p.perm=numeric(num.var)

for(b in 1:B){

for(i in 1:num.symptoms){

    my.var.y=sample(data.symptoms.yn[,i])
    #my.table=table(data.symptoms.yn[,i]==1, my.var.perm=="Male")
    #OR[i]=my.table[1,1]*my.table[2,2]/my.table[1,2]/my.table[2,1]
    chi.p.perm[i]=t.test(my.var ~my.var.y)$p.value  

    }   

    chi.p.min[b]=min(chi.p.perm)

    }#end for b

chi.p.adj.perm=unlist(lapply(1:num.var, function(i) (sum(chi.p[i]>=chi.p.min)+1)/(B+1)) )

crtahlin commented 10 years ago

You have to have the library(permute) for the permutation tests (have added it to DESCRIPTION in ebb4c5eb8e8c0dc3c912ca72dcb4c143e96dd78d.

I thought the permutation tests were for computing a P value non-parametricaly, now I see you probably meant some kind of permutation test for correcting the P value for multiple comparisons. Actualy, now thinking about it, maybe it does support multivariate test... Will have to read your code and some theory and will get back to you.

crtahlin commented 10 years ago

Done permutation based P value correction for proportions in a4f8eb82a92ad616dd79c90183c8473d2c0c2a92.

@llaarraa : Still haven't implemented permutation based P value correction for medians, since I do not completely understand what the P values there represent. (See also my comment https://github.com/crtahlin/medplot/issues/61#issuecomment-40603769 ). Since the wilcox.test() does not compare the difference in medians, but whether or not the samples are different (as far as I understand from ?wilcox.test(), either:

the wilcox.test() is not appropriate (in that case the values in the table are wrong)
we do not actually want to calculate the P value for the difference of medians (in which case the explanation bellow the table was wrong - have already deleted it just in case) Which case is it? If the wilcox.test is appropriate, I guess i could do the permutation test analogously as for the proportions?

llaarraa commented 10 years ago

For the Mann-Whitney test the null hypothesis is that the distributions are the same in the two populations. I would simply use this test and correct the table description.

crtahlin commented 10 years ago

Implemented multivariate permutation tests also for the "medians" table in commit 7d1c30830273b6eadac465e23c36dd5d65abd160 . All implemented, closing.

crtahlin / medplot

Distr. of vars by group. - add P values containing multiple testing corrections #61