HCBravoLab / metagenomeSeq

Statistical analysis for sparse high-throughput sequencing
64 stars 20 forks source link

CSS scaling factors in fitZig model matrix #68

Closed mcalgaro93 closed 5 years ago

mcalgaro93 commented 5 years ago

Hello, I'm using metagenomeSeq and many other packages for comparisons. My issue is about the way model matrix for method fitZig is obtained. The code says:

# Initializing the model matrix
    if (useCSSoffset == TRUE){
        if (any(is.na(normFactors(obj)))) {
          stop("Calculate the normalization factors first!")
        }
        mmCount <- cbind(mod, log2(normFactors(obj)/1000 + 1))
        colnames(mmCount)[ncol(mmCount)] <- "scalingFactor"
    } else { 
        mmCount <- mod
    }

So by default model matrix is made by intercept, coefficients if they are present in mod and log2(normFactors(obj)/1000 + 1). But in the paper is also written that: image

So I understand the 1000 at denominator. For my data I prefer to use the median of the scaling factor as denominator (as suggested above). So I put useCSSoffset = FALSE and I manually calculate offsets... The thing I don't get is the fact that in supplementary notes is reported a different log2 of the scaling factor. image

So which is the correct way to include the scaling factor in the model?

Wouldn't be better to put one of the solutions above as default instead of scaling per 1000? Thanks in advance, best regards.

Matteo

hcorrada commented 5 years ago

Thank you for writing. The equation on the supplementary material is incorrect, log2((s/N) + 1) is correct, as in the code you link.

We've vacilated between the 1000 vs. median scale factor in our analyses and resort to setting the offsets manually for the latter. With N=1000 interpretation is "count per thousands", which median scale factor doesn't have.