Closed yuyingxie closed 4 years ago
I mean we calculate 'sizeFactors' using computeSumFactors(B, clusters=clusters)
but here, it uses 'log(colSums(y))'.
Which one should be used?
As the code you quote clearly indicates, log(colSums(y))
is used only if is.null(sizeFactors(x)))
, in other words if you haven't provided pre-calculated size factors.
My question is the differnce between the two ways of calculation.
computeSumFactors
should use the pooling method, which has been argued to be more robust, and it will scale the factors so that they average to 1 (which shouldn't make a difference for its use as offset in the model fitting), but otherwise the results should be roughly the same.
I notice that in the code
' if (is.null(sizeFactors(x))) { cd$ls <- log(colSums(y)) } else { cd$ls <- sizeFactors(x) }
'
If we already calculated the sizeFactor, why do we need to caculate it again?This createds an issue. If we only want to test a few of the gene, the sizeFactor is better to be calculated by all the genes instead of the handful gnees.