RGLab / MAST

Tools and methods for analysis of single cell assay data in R
228 stars 57 forks source link

Any chance of an example for RNA-seq counts? #39

Closed JohnReid closed 10 years ago

JohnReid commented 10 years ago

I've got some single-cell RNA-seq counts (normalised with DESeq2) which I'd like to model with your mixture of continous and discrete components. I'm having some problems getting any sensible p-values from zlm.SingleCellAssay and I'm not finding the intro vignette or the package documentation particularly helpful. Is there any chance you could put together a small example, as you did for the fluidigm assay?

Thanks, John.

gfinak commented 10 years ago

Hi, John Could you be a bit more explicit about what you are doing? We can discuss off list if you'd like.

zlm has some problems with single-cell RNA seq data, due to sporadic outliers in the continuous component even when there is very little evidence for expression. This can lead to inflated p-values and false positives. We are working on several approaches to improve the model for single-cell rna seq data. One thing that's implemented now that you could try is using empirical Bayes shrinkage of the variance, by setting ebayes=TRUE in the zlm call. Can you tell me a bit more in what sense the p-values you get are not sensible, maybe provide an example?

Cheers, Greg

On Thu, Aug 28, 2014 at 8:57 AM, JohnReid notifications@github.com wrote:

I've got some single-cell RNA-seq counts (normalised with DESeq2) which I'd like to model with your mixture of continous and discrete components. I'm having some problems getting any sensible p-values from zlm.SingleCellAssay and I'm not finding the intro vignette or the package documentation particularly helpful. Is there any chance you could put together a small example, as you did for the fluidigm assay?

Thanks, John.

— Reply to this email directly or view it on GitHub https://github.com/RGLab/SingleCellAssay/issues/39.

JohnReid commented 10 years ago

Hi, Thanks for the quick response. I've managed to work out that I had a typo in my hypothesis argument to zlm.SingleCellAssay. I had inadvertently capitalised Time which slipped by without a warning. So sorry for the noise.

After fixing that, I'm getting some p-values to explore and I'm having a look at them now. I'll definitely try the empirical Bayes option so thanks for that suggestion. Also I don't know if you have any experience with DESeq2, but if you have an opinion on whether normalising the counts using size factors makes sense before passing them to SingleCellAssay I would be pleased to hear it.

Many thanks for providing the package, John.

JohnReid commented 10 years ago

Just a quick question or two about the arguments to the SingleCellAssay constructor.

index

Thanks!

gfinak commented 10 years ago

No thresholding is done by default. You could threshold, but it's difficult to say where that threshold should be. It's something we are working on.

Differential expression is on the log transformed scale. We model the data as log-normal.

You can construct the object with transformed or untransformed data. The concept of "layers" in the object lets you deal with different transformations of the data. On Sep 1, 2014 2:13 AM, "JohnReid" notifications@github.com wrote:

Just a quick question or two about the arguments to the SingleCellAssay constructor.

  • Am I right in assuming they should be on an untransformed scale (i.e. not log-transformed)?
  • How is zero expression fit by the model? Does it fit small expression values as 0 or do they actually have to be 0? I ask becuase my normalised counts look like the following (on the log-scale). I have some zero counts but I also want to model those small counts as 0.

[image: index] https://cloud.githubusercontent.com/assets/1790516/4106260/c6043af4-31b7-11e4-9c18-302bdfd76e49.png

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/RGLab/SingleCellAssay/issues/39#issuecomment-54037577 .