Vignettes and also general questions

cozygene / bisque

An R toolkit for estimation of cell composition from bulk expression data

68 stars 20 forks source link

Vignettes and also general questions #13

Closed uklejaj closed 4 years ago

uklejaj commented 4 years ago

Hello and thank you for this wonderful package.

I was attempting to look at the vignette after installing Bisque RNA successfully, and when running browseVignettes("BisqueRNA") I got No vignettes found by browseVignettes("BisqueRNA") Would you happen to know what might be causing this?

Additionally, I was wondering if it would be hypothetically possible to extract cell proportions of bulk samples from single cell reference data of the different cell types. Say for example I had a bulk sample of some combination of four types of cells, could I extract the proportions using the expected RNAseq results of a single cell reference taken as an average from bulk studies? In this case, there would only be one sample for the single cell, and one bulk sample to deconvolute? From crudely testing, it seems the issue is at least two subjects are needed, and I'm not sure if there is a minimum number of single cell data required for each type.

Thanks!

brandonjew commented 4 years ago

Hi,

Thanks for the feedback!

If you installed with devtools::install_github, the vignette is not built by default. You should either install through CRAN (install.packages) or run devtools::install_github("cozygene/bisque", build_vignettes = TRUE). The vignette can also be found online here: https://cran.r-project.org/web/packages/BisqueRNA/vignettes/bisque.html

For your second point, the reference-based decomposition requires multiple samples (in bulk and single-cell) to learn a transformation between bulk and single-cell measurements. Our method cannot do this with only one sample. However, if you could further elaborate on the process of averaging bulk studies to generate a single-cell reference, we can discuss alternative approaches.

Let me know if you have any other quesitons.

Thanks, Brandon

uklejaj commented 4 years ago

One of my big interests in using Bisque is to try to see if hypothetically post-facto we could extract proportions of bulk-RNAseq data using single cell data. The cell types are quite similar genetically, so using markers might not be too plausible. I have been testing different deconvolution methods with these conditions: bulk sample data created from combining different known quantities of single cells and seeing if Bisque could deconvolute them. While different combinations could be viewed as different samples, for this test, they would all be generated from the same single cell data. single cell data While we have single cell data for each cell type, due to the vast heterogeneity of these types, I wanted to first test if Bisque could deconvolute proportions from a base case. I have the average counts of these cell lines taken from the Cancer Cell Line Encyclopedia (CCLE). Therefore, I created a single cell eset where there was only one of each type of cell. Would this not work? Best, Jake

brandonjew commented 4 years ago

This is a valid application of Bisque, even with cell types that have closely related marker genes, if multiple samples are present. In this case, the single-cell eset should contain the entire single-cell data since we use the distribution of cell proportions in addition to the average single-cell expression.

I will note that the approach of Bisque and other recent methods require multiple samples in order to account for the fact that bulk RNA-seq typically differs significantly from simulated mixtures of single-cell data (due to technical confounding effects that we attempt to identify). This is a behavior we have observed from matched samples that have both single-cell and bulk RNA-seq available.

The validation of several methods includes similar simulations, where bulk expression is simulated by aggregating single-cell data. For example, the authors of CIBERSORTx performed deconvolution on tumor samples simulated by aggregating single-cell data.