Open mebbert opened 6 years ago
hi @mebbert I really appreciate your detailed description and sorry for the delay in response.
First of all, I am confused on why you can get different results (30 genes and 126 genes) when you use the same differentialGeneTest
call. The size factor won't matter and I believe that if we set relative_expr = FALSE, the data won't be normalized by the library size.
Should I run cells <- estimateSizeFactors(cells) and cells <- estimateDispersions(cells) with 10x Genomics UMI data? This is directly related to #50. estimating size factor and dispersion is still required for the 10x data. size factor is used to adjust for the library size (like the number of reads measures or the total UMI measured in each cell). dispersion is used to provide a hint for the dispersion parameter during the differentialGeneTest function.
Is there a bug where erroneous size factors persist even if I re-run newCellDataSet?
when you recreate a new cell dataset with newCellDataSet
function, the size factor will initialized as NA
Should differentialGeneTest with relative_expr = FALSE ignore whether cells <- estimateSizeFactors(cells) and cells <- estimateDispersions(cells) are run/set?
I think it is opposite, once the relative_expr
is set to be FALSE, the size factor will be ignored when performing the DEG test.
Oh, and the small "bug" in plot_genes_jitter.
Thanks for pointing out this, we will remove the line related to cds_exprs$adjusted_expression <- log10(cds_exprs$expression)
to avoid confusion.
Hi,
I've been getting familiar with Monocle, and I believe I found a bug when running
estimateSizeFactors
before runningdifferentialGeneTest
withrelative_expr = FALSE
. If I understand the documentation anddifferentialGeneTest
code properly, size factors should be ignored whenrelative_expr = FALSE
, but I don't think that's happening.Basically, I am getting 30 differentially expressed genes if I run
I get 126 diff. expressed genes if I omit
cells <- estimateSizeFactors(cells)
, however. I assume the result with 126 genes is the correct result, since I'm using 10x Genomics single cell (UMI) data. I didn't realize I don't need to runcells <- estimateSizeFactors(cells)
.Also, a less important (and unrelated) "bug" (if it even qualifies as a bug), it looks like
plot_genes_jitter
calculatescds_exprs$adjusted_expression <- log10(cds_exprs$expression)
, but never uses it. The data is transformed by ggplot, instead.Really appreciate your help.
Correction
Apparently you cannot run
cells <- estimateDispersions(cells)
without first runningcells <- estimateSizeFactors(cells)
in a clean environment. I deleted all objects (rm(list=ls())
) and re-ran the code, getting an error stating that I had to runcells <- estimateSizeFactors(cells)
, first.I do think something funky is going on though, because if I re-run the code (omitting
cells <- estimateSizeFactors(cells)
) without deleting all objects, I am able to runcells <- estimateDispersions(cells)
, and I get the different results mentioned above.So, I guess there are potentially four different issues here:
cells <- estimateSizeFactors(cells)
andcells <- estimateDispersions(cells)
with 10x Genomics UMI data? This is directly related to #50.newCellDataSet
?differentialGeneTest
withrelative_expr = FALSE
ignore whethercells <- estimateSizeFactors(cells)
andcells <- estimateDispersions(cells)
are run/set?plot_genes_jitter
.