XuegongLab / SCeQTL

15 stars 8 forks source link

cal.pvalue returns a dataframe with an empty p-value column #9

Open bolbatav opened 1 year ago

bolbatav commented 1 year ago

The title is self-explanatory. After a few hours of poking around the code with this problem I was able to pinpoint where it is. The reason for it is the EM parameter passed between functions. It became obsolete in the dependency pscl. If you download the SCeQTL sourcefile archive, unpack it, remove the EM parameter everywhere it's mentioned, repack the archive and install it as a source, everything works fine. While you're at it you can also insert the following line between lines 77 and 78 of the calc.q.value.R file:

qvalue = qvalue(unlist(pvalue))$qvalues

and edit current lines 78 and 79 as follows:

result = data.frame(gene.name, unlist(snp.name), unlist(pvalue), qvalue) colnames(result) <- c("gene","snp","pvalue","qvalue")

in order to calculate both p-value and q-value in one pass over the dataset.

bolbatav commented 1 year ago

As it turns out, calculating q-value at the same pass as p-value is not a great idea, since on real data there may be missing p-values, and qvalue() function can't deal with missing values, leading to an error of the whole function. Instead, do na.omit()on the output table of cal.pvalue() function and then apply qvalue()to its pvalue column.