andreyshabalin / MatrixEQTL

Matrix eQTL: Ultra fast eQTL analysis via large matrix operations
58 stars 17 forks source link

Why pvOutputThreshold affect p value? #17

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

Some strange things happen when I change pvOutputThreshold.

firt run

library(MatrixEQTL) useModel = modelLINEAR SNP_file_name = "GoodGene.exp.geno.numeric.xls" expression_file_name = "GoodGene.exp.ra.exp.xls" covariates_file_name = "GoodGene.exp.cov2.xls" output_file_name = "GoodGene.exp.0.05.results.xls" pvOutputThreshold = 1.43771984533585e-08

snps = SlicedData$new() snps$fileDelimiter = "\t" # the TAB character snps$fileOmitCharacters = "NA" # denote missing values; snps$fileSkipRows = 1 # one row of column labels snps$fileSkipColumns = 1 # one column of row labels snps$fileSliceSize = 2000 # read file in slices of 2,000 rows snps$LoadFile(SNP_file_name)

gene = SlicedData$new() gene$fileDelimiter = "\t" # the TAB character gene$fileOmitCharacters = "NA" # denote missing values; gene$fileSkipRows = 1 # one row of column labels gene$fileSkipColumns = 1 # one column of row labels gene$fileSliceSize = 2000 # read file in slices of 2,000 rows gene$LoadFile(expression_file_name)

cvrt = SlicedData$new() cvrt$fileDelimiter = "\t" # the TAB character cvrt$fileOmitCharacters = "NA" # denote missing values; cvrt$fileSkipRows = 1 # one row of column labels cvrt$fileSkipColumns = 1 # one column of row labels if(length(covariates_file_name)>0) { cvrt$LoadFile(covariates_file_name) }

me = Matrix_eQTL_engine( snps = snps, gene = gene, cvrt = cvrt, output_file_name = output_file_name, pvOutputThreshold = pvOutputThreshold, useModel = useModel, verbose = TRUE, pvalue.hist = "qqplot", min.pv.by.genesnp = FALSE, noFDRsaveMemory = FALSE)

pdf(file = paste0(output_file_name, ".pdf"), width = 8, height = 8) print(me$param$dfFull)

second run

library(MatrixEQTL) useModel = modelLINEAR SNP_file_name = "GoodGene.exp.geno.numeric.xls" expression_file_name = "GoodGene.exp.ra.exp.xls" covariates_file_name = "GoodGene.exp.cov2.xls" output_file_name = "GoodGene.exp.0.01.results.xls" pvOutputThreshold = 2.8754396906717e-09

snps = SlicedData$new() snps$fileDelimiter = "\t" # the TAB character snps$fileOmitCharacters = "NA" # denote missing values; snps$fileSkipRows = 1 # one row of column labels snps$fileSkipColumns = 1 # one column of row labels snps$fileSliceSize = 2000 # read file in slices of 2,000 rows snps$LoadFile(SNP_file_name)

gene = SlicedData$new() gene$fileDelimiter = "\t" # the TAB character gene$fileOmitCharacters = "NA" # denote missing values; gene$fileSkipRows = 1 # one row of column labels gene$fileSkipColumns = 1 # one column of row labels gene$fileSliceSize = 2000 # read file in slices of 2,000 rows gene$LoadFile(expression_file_name)

cvrt = SlicedData$new() cvrt$fileDelimiter = "\t" # the TAB character cvrt$fileOmitCharacters = "NA" # denote missing values; cvrt$fileSkipRows = 1 # one row of column labels cvrt$fileSkipColumns = 1 # one column of row labels if(length(covariates_file_name)>0) { cvrt$LoadFile(covariates_file_name) }

me = Matrix_eQTL_engine( snps = snps, gene = gene, cvrt = cvrt, output_file_name = output_file_name, pvOutputThreshold = pvOutputThreshold, useModel = useModel, verbose = TRUE, pvalue.hist = "qqplot", min.pv.by.genesnp = FALSE, noFDRsaveMemory = FALSE)

pdf(file = paste0(output_file_name, ".pdf"), width = 8, height = 8) print(me$param$dfFull)

The strange thing is that there are 503951 eQTL in GoodGene.exp.0.05.results.xls (and 294781 eQTL' p-value <= 2.8754396906717e-09), but there are only 1577 eQTL in GoodGene.exp.0.01.results.xls. Why pvOutputThreshold affect p value, is this a bug of matrixEQTL? GoodGene.exp.0.01.results.xls.gz GoodGene.exp.0.05.results.xls.gz

Best, Kun

andreyshabalin commented 2 years ago

Hm, pvOutputThreshold should not affect the p-values. Would you share your data files so I can test it? Without your data files I cannot replicate the results.

andreyshabalin commented 2 years ago

Feel free to contact me directly at andrey.shabalin@gmail.com