Closed Biomiha closed 5 years ago
Hi Miha,
Thanks for using the package! The issue with the pbmc_3k dataset is that it contains cells from only one individual. Currently, the reference-based decomposition requires multiple individuals in the single-cell data in order to learn the decomposition transformation.
I will update the package to clarify the requirement for multiple samples for this function. Thanks again and let me know if you have any other questions!
Brandon
Thanks for the clarification Brandon. How many in your opinion would be needed for a reasonable estimate, obviously bearing in mind these would be different individuals for the scRNAseq data and different for the bulk?
The smallest number we used was 6 and we observed reasonable estimates without using the overlap option. The method utilizes the observed variance in cell proportions from the single-cell data, so that should be kept in mind when considering single-cell dataset size.
OK, thanks. It seems I can get at least 9 (https://bioconductor.org/packages/release/data/experiment/vignettes/TENxPBMCData/inst/doc/TENxPBMCData.html). I'll have a go and let you know how I get on. BW, Miha
There may be an issue with these datasets being sequenced with different kits, since batch effects from the different kits (not sure how significant they are) could cause issues with the generated reference profile. The donor_a, donor_b, donor_c, 4k, and 8k samples seem to be generated with the v2 chemistry, but processed with different cell ranger versions.
Hi Brandon,
I used the MNN batch correction from the Marioni lab and it's done a reasonably good job of correcting the batch effects. There was one donor that looked out of place, which I've removed (for better or for worse). Now I get good deconvolution with bisque. One thing I noticed though was that all cell populations needed to be present in all of the donors, which could present issues with either rare cell populations or populations that are affected by treatment or time or other effects. Something to consider maybe.
Thanks, Miha
Hi Miha,
Glad to hear and thanks for the insight. We'll put some more thought into rare cell populations only detected in a subset of the individuals used for the reference. In the next update we'll allow the reference profile to be built in these cases and notify the user.
Thanks again! Brandon
Dear Miha,
Could you share with me the Seurat object that you eventually used to deconvolut with bisque and determine the cell fractions from your PBMCs? I am having many issues trying to generate a reference dataset (the Seurat object) that allows bisque to properly work and deconvolut my RNAseq data from PBMCs.
Thanks,
Aleix
Hi Aleix,
Thanks for trying out the package. What issues are you running into? I might be able to help with them.
Thanks, Brandon
Hi Brandon,
I was trying to perform the refarence-based decomposition using the available single cell RNA seq dataset available from the the Seurat tutorial (the pbmc_3k dataset).
sc.eset <- BisqueRNA::SeuratToExpressionSet(pbmc3k, delimiter="-", position=2, version="v3")
I converted my expression data (data.frame) into a matrix and then into an ExpressionSet format by:
counts2<-as.matrix(counts) bulk.eset <- Biobase::ExpressionSet(assayData = counts2)
( "bulk.eset" returns: ) ExpressionSet (storageMode: lockedEnvironment) assayData: 52422 features, 465 samples element names: exprs protocolData: none phenoData: none featureData: none experimentData: use 'experimentData(object)' Annotation:
However, when I run:
res <- BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers=NULL, use.overlap = FALSE)
I get the following error:
Error in GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose) : No overlapping genes found between bulk and single-cell expression.
Which I do not understand because "use.overlap" is set to FALSE. Do you know why I get this error? Then from your discussion with Miha I saw you said (on June) that "Currently, the reference-based decomposition requires multiple individuals in the single-cell data in order to learn the decomposition transformation." which I guess is still true.
I am currently trying to obtain a good Seurat object to use as reference from multiple individuals, but I haven't been successful yet. For this reason I would like to know if Miha (or somobody else) could make available the Seurat object he (or others) used to eventually get good deconvolution with bisque for PBMCs.
Thanks for your help,
Aleix
Hi Aleix,
The "use.overlap" parameter is for indicating overlapping individuals between the single-cell and bulk experiments. In this case, the issue is due to no overlapping genes between the datasets (perhaps due to naming differences). You can compare the gene names in each dataset with Biobase::featureNames(eset)
In terms of number of individuals, we have found in our experiments that less than 3 samples used as a reference may be inaccurate to use with our model.
Let me know if you need anything else
Thanks, Brandon
Hi,
This is my first time analyzing RNAseq data and I was wondering if anyone could help me. I have bulk RNA seq data that I received after sorting a specifically tagged cell population. I would like to determine what is the cell composition of that sample to help understand what type of cell it is through RNA sequencing. How can I use Bisque to help determine the cell composition of my samples? Also, how should the input data look like?
Thank you for your time!
Hi @gruben93, thanks for your interest in our method and welcome to the RNAseq world! Given that your RNAseq was done on a sorted cell population, Bisque would not be useful here (it's designed for quantifying cell types within unsorted bulk tissue samples). You should be able to tell what cell type your sample is composed of based off of the tag used to do the sorting.
If you're concerned about the sorting, you may be able to confirm the cell type by checking if the expected cell type markers (https://panglaodb.se/markers.html?cell_type=%27choose%27) are expressed. One issue with this approach is that it is not clear if the expression of these genes is significantly upregulated if there are no samples from other cell types to compare to. Let me know if you have any other questions
Hi guys,
Kudos for developing this package. I'm really interested to test it on our data. I've had a couple of gos and I'm experiencing some issues though. I'm looking to determine the cell fractions from PBMCs by supplying a publicly a available single cell RNA seq dataset as reference (e.g. the pbmc_3k dataset from the Seurat tutorial), where we have clustered the cells and determined their putative cell types. What I've noticed is that I get nearly identical bulk.props for all the samples, which doesn't seem right. I've tried it on one of our datasets as well as a publicly available one (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115259) and I get nearly exactly the same bulk.prop values across the board.
I'm guessing that this could be because the scRNA seq dataset is not from any of the bulk RNA seq samples, however I imagine this would be the major use case for running deconvolution as long as the reference samples are from the same cell types?
Thanks, Miha