RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

Selecting count data in zlm function #158

Closed IrSoler closed 3 years ago

IrSoler commented 3 years ago

Dear MAST team,

I'm trying to fit a model with zlm using this code:

zlmCond_cells = zlm(~0+group+cngeneson, sce.assay_cells, exprs_values = 'log')

where sce.assay_cells is a sca object which contains 3 assays:

Independently of the assay I pick in exprs_values parameter I obtain the same result. I've also tried to pass the assay instead of the name (e.g. assay(sce.assay_cells, 'counts')) and, again, my results are always the same.

I've looked at the code in https://rdrr.io/bioc/MAST/src/R/zeroinf.R but I haven't found how the function works to select the expression values.

I was wondering if I have misunderstood the usage of this parameter or if this is a bug to report.

Any help would be appreciated. Thanks you in advance!

Best wishes, Ir

amcdavid commented 3 years ago

Thanks for the report, definitely a bug! Try installing the most recent version devtools::install_github('RGLab/MAST') which should fix this.

IrSoler commented 3 years ago

Thank you so much for your early response!!

At first, I thought that I had commited a mistake, so I decided to create a new sca object containing only the 'log' count data. When I tried it, I obtained 177 genes differentially expressed. However, executing now your recent version there are 2016 genes differentially expressed. Could be that possible?

Thanks in advance and sorry for the inconvenience, Ir

El vie, 7 may 2021 a las 15:14, Andrew McDavid @.***>) escribió:

Thanks for the report, definitely a bug! Try installing the most recent version devtools::install_github('RGLab/MAST') which should fix this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/158#issuecomment-834373000, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNUIWQ66J6I7I4ACMLNNLDTMPRTXANCNFSM44JVUZXA .

amcdavid commented 3 years ago

The changes I committed should not have introduced any differences in differential expression if you are using the same slot of the assay. So I suspect this is not actually a comparison of "like" to "like" here. If you can provide a reproducible example of the behavior you are seeing, I can take a closer look.

IrSoler commented 3 years ago

Sure! There is it:

--------------------------------------------------- OPTION 1


sce.hvg = readRDS("results/experiment.rds") # Single cell experiment object containing the data assay(sce.hvg, "log") = log2(counts(sce.hvg) + 1)

Create new sce object (only 'log' count data)

sce.hvg.1 = SingleCellExperiment(assays = list(log = assay(sce.hvg, "log"))) colData(sce.hvg.1) = colData(sce.hvg)

Scaling ngenes

cdr2_cells = colSums(assay(sce.hvg.1, "log")>0) colData(sce.hvg.1)$cngeneson = scale(cdr2_cells)

Construct sca object

sce.assay_cells = SceToSingleCellAssay(sce.hvg.1) rowData(sce.assay_cells)$primerid = rownames(sce.assay_cells)

Model

zlmCond_cells = zlm(~0+group+cngeneson, sce.assay_cells, exprs_values = "log")

Test hypothesis

zlmCond_cellsF <- lrTest(zlmCond_cells, Hypothesis("group1 - group2"))

----------------------------------------------------------- OPTION 2


sce.hvg = readRDS("results/experiment.rds") # Single cell experiment object containing the data assay(sce.hvg, "log") = log2(counts(sce.hvg) + 1)

Scaling ngenes

cdr2_cells = colSums(assay(sce.hvg, "log")>0) colData(sce.hvg)$cngeneson = scale(cdr2_cells)

Construct sca object

sce.assay_cells = SceToSingleCellAssay(sce.hvg) rowData(sce.assay_cells)$primerid = rownames(sce.assay_cells)

Model

zlmCond_cells = zlm(~0+group+cngeneson, sce.assay_cells, exprs_values = "log")

Test hypothesis

zlmCond_cellsF <- lrTest(zlmCond_cells, Hypothesis("group1 - group2"))



Sorry for the inconvenience.

El vie, 7 may 2021 a las 18:42, Andrew McDavid @.***>) escribió:

The changes I committed should not have introduced any differences in differential expression if you are using the same slot of the assay. So I suspect this is not actually a comparison of "like" to "like" here. If you can provide a reproducible example of the behavior you are seeing, I can take a closer look.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/158#issuecomment-834609791, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNUIWRS24ZCCPSHV5MOZVLTMQJ6DANCNFSM44JVUZXA .

amcdavid commented 3 years ago

So you get a different result in zlmCond_cellsF in option 1, using version 1.17.4 vs option 2 and version 1.17.4? That's almost certainly a bug.

Could you send (a subset?) of the data you are using? Andrew_McDavid@urmc.rochester.edu

amcdavid commented 3 years ago

Thanks for sending the data. This was indeed still a bug related to exprs_values in some code that is called by upstream by zlm. After simplifying your code, the following fails in 1.17.4 and passes in 1.17.5 / 02c8b60.

I always appreciate when someone takes the time to open an issue and provides enough detail to reproduce, as you have done. The smaller the data and shorter code someone else working with a bug report has to run to reproduce the issue, the quicker it all goes.

install_github('RGLab/MAST')
library(MAST)
sce.hvg = readRDS("experiment.rds")[1:500,]
assay(sce.hvg, "log") = log2(counts(sce.hvg) + 1)
# Scaling ngenes
cdr2_cells = colSums(assay(sce.hvg, "log")>0)
colData(sce.hvg)$cngeneson = scale(cdr2_cells)

# Create new sce object (only 'log' count data)
sce.hvg.1 = SingleCellExperiment(assays = list(log = assay(sce.hvg, "log")))
colData(sce.hvg.1) = colData(sce.hvg)

do_zlm = function(obj){
    sca = SceToSingleCellAssay(obj)
    rowData(sca)$primerid = rownames(sca)

    # Model
    zz = zlm(~diagnosis+cngeneson, sca, exprs_values = "log", method = 'glm', ebayes = TRUE)

    # Test hypothesis
    #lrt <- lrTest(zz, Hypothesis("diagnosisMS - diagnosisControl"))
    list(zz = zz, sca = sca)
}

log_assay = do_zlm(sce.hvg.1)
all_assays = do_zlm(sce.hvg)
#log_assay2 = do_zlm(sce.hvg.1)
all.equal(log_assay$zz@coefC, all_assays$zz@coefC)
all.equal(log_assay$zz@coefD, all_assays$zz@coefD)
all.equal(log_assay$zz@deviance, all_assays$zz@deviance)
all.equal(log_assay$zz@vcovD, all_assays$zz@vcovD)
all.equal(log_assay$zz@vcovC, all_assays$zz@vcovC)
all.equal(assay(log_assay$sca, 'log'), assay(all_assays$sca, 'log'))
IrSoler commented 3 years ago

Thanks for your early response and for simplifying the code!

Sorry for bothering you another time, but when I execute the code the lrTest results are completely different (despite the fact that zz parameters all equal).

El lun, 10 may 2021 a las 19:42, Andrew McDavid @.***>) escribió:

Thanks for sending the data. This was indeed still a bug related to exprs_values in some code that is called by upstream by zlm. After simplifying your code, the following fails in 1.17.4 and passes in 1.17.5 / 02c8b60 https://github.com/RGLab/MAST/commit/02c8b60c960531e6990ff32af8aa6435923ea278 .

I always appreciate when someone takes the time to open an issue and provides enough detail to reproduce, as you have done. The smaller the data and shorter code someone else working with a bug report has to run to reproduce the issue, the quicker it all goes.

install_github('RGLab/MAST') library(MAST)sce.hvg = readRDS("experiment.rds")[1:500,] assay(sce.hvg, "log") = log2(counts(sce.hvg) + 1)# Scaling ngenescdr2_cells = colSums(assay(sce.hvg, "log")>0) colData(sce.hvg)$cngeneson = scale(cdr2_cells)

Create new sce object (only 'log' count data)sce.hvg.1 = SingleCellExperiment(assays = list(log = assay(sce.hvg, "log")))

colData(sce.hvg.1) = colData(sce.hvg)

do_zlm = function(obj){ sca = SceToSingleCellAssay(obj) rowData(sca)$primerid = rownames(sca)

# Model
zz = zlm(~diagnosis+cngeneson, sca, exprs_values = "log", method = 'glm', ebayes = TRUE)

# Test hypothesis
#lrt <- lrTest(zz, Hypothesis("diagnosisMS - diagnosisControl"))
list(zz = zz, sca = sca)

} log_assay = do_zlm(sce.hvg.1)all_assays = do_zlm(sce.hvg)#log_assay2 = do_zlm(sce.hvg.1) @.C, @.C) @.D, @.D) @., @.) @.D, @.D) @.C, @.C) all.equal(assay(log_assay$sca, 'log'), assay(all_assays$sca, 'log'))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/158#issuecomment-837028968, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNUIWRMOSXA7KYEZREI7D3TNALIXANCNFSM44JVUZXA .

amcdavid commented 3 years ago

Hoo boy, you are correct. This ended up being fairly subtle. See f8fd6ab / 1.17.6 for hopefully the definitive fix 😬

IrSoler commented 3 years ago

It worked perfectly. Thank you so much! 😃

El mar, 11 may 2021 a las 20:30, Andrew McDavid @.***>) escribió:

Hoo boy, you are correct. This ended up being fairly subtle. See f8fd6ab https://github.com/RGLab/MAST/commit/f8fd6abb4746a5645f53627f224ab2bcb8bc900b / 1.17.6 for hopefully the definitive fix 😬

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/158#issuecomment-838962430, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNUIWVUWJYXHUDBY34O7WLTNFZVPANCNFSM44JVUZXA .