Open njrobins opened 11 months ago
Apologize for the delay.
There are two reasons that might have caused underestimated cell type fraction. The first is the reason you mentioned in the email: sparser scRNA-seq reference. The second reason is that some cell types have lower total transcription level, and as BayesPrism estimates reads% from each cell type rather than cell count% of each cell type, it will tend to show a seemingly underestimated fraction.
I would recommend that you look into the scRNA reference to see is this is the case. Additionally you may also simply compute the fraction of reads from the marker genes of each cell type in each mixture as a sanity check.
On Wed, Nov 29, 2023 at 04:20 njrobins @.***> wrote:
Hello! I recently employed BayesPrism to deconvolve a bulk RNAseq dataset from a particular region of the embryonic mouse brain. This region (the striatum) houses specific neuronal populations that exhibit a fairly well-documented distribution; namely, spiny projection neurons (SPNs) make up ~95% of neurons and ~50% of total cells in the striatum. SPNs can be functionally divided into two subpopulations; thus, each of these populations should constitute ~20-25% of the total cells in a given striatal sample. Notably, this was the case in the single-cell RNAseq dataset I used as a reference for deconvolution (see Fig. 1B in this paper https://www.nature.com/articles/s41598-023-36255-5).
When I used BayesPrism to deconvolve my bulk dataset with the reference above, one of the two SPN subpopulations was predicted to make up ~20% of the total sample, in line with what I expected. However, the other subpopulation was predicted to be present at a much lower proportion (<1%). This was true across both genotypes I was comparing, suggesting it was not a biological phenomenon attributable to my experimental manipulation. Moreover, it held true whether I used all expressed genes (pre-filtered as described in the BayesPrism tutorial) or selected marker genes (using select.marker) for cell type estimation.
In my mind, this could conceivably be due to low expression or dropout of genes that are expressed selectively in this cell type. However, in my all-genes analysis, after filtering there are still several genes included whose expression is reasonably selective for this cell type over all others. And, to reiterate, the reference dataset contained the expected proportions of both of these cell types, so, in that regard, the reference seems unlikely to have introduced bias into the doconvolution.
Thus, my question is: is there any aspect of the BayesPrism workflow that might tend to systematically underestimate specific cell populations? And, if so, what would be the reason for this, and are there computational methods that might lessen or circumnavigate such an issue? I am happy to provide additional information and/or code for further clarification. I greatly appreciate any help you can provide!
— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/67, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS4QY3WHALZRI6TYIDDYGZBQ5AVCNFSM6AAAAAA76K52GGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTKMRTGQ4TGMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks so much for your response! One of the cell types in my reference is, I believe, actually a heterogenous mix of multiple cell types (or a single, highly plastic cell type that expresses markers of other cell lineages). Some of the markers of this population overlapped with my underrepresented population, and so my cells of interest were being mis-classified. I now have a workaround for this that seems to have resolved the issue.
Thank you again!
Hello! I recently employed BayesPrism to deconvolve a bulk RNAseq dataset from a particular region of the embryonic mouse brain. This region (the striatum) houses specific neuronal populations that exhibit a fairly well-documented distribution; namely, spiny projection neurons (SPNs) make up ~95% of neurons and ~50% of total cells in the striatum. SPNs can be functionally divided into two subpopulations; thus, each of these populations should constitute ~20-25% of the total cells in a given striatal sample. Notably, this was the case in the single-cell RNAseq dataset I used as a reference for deconvolution (see Fig. 1B in this paper).
When I used BayesPrism to deconvolve my bulk dataset with the reference above, one of the two SPN subpopulations was predicted to make up ~20% of the total sample, in line with what I expected. However, the other subpopulation was predicted to be present at a much lower proportion (<1%). This was true across both genotypes I was comparing, suggesting it was not a biological phenomenon attributable to my experimental manipulation. Moreover, it held true whether I used all expressed genes (pre-filtered as described in the BayesPrism tutorial) or selected marker genes (using
select.marker
) for cell type estimation.In my mind, this could conceivably be due to low expression or dropout of genes that are expressed selectively in this cell type. However, in my all-genes analysis, after filtering there are still several genes included whose expression is reasonably selective for this cell type over all others. And, to reiterate, the reference dataset contained the expected proportions of both of these cell types, so, in that regard, the reference seems unlikely to have introduced bias into the doconvolution.
Thus, my question is: is there any aspect of the BayesPrism workflow that might tend to systematically underestimate specific cell populations? And, if so, what would be the reason for this, and are there computational methods that might lessen or circumnavigate such an issue? I am happy to provide additional information and/or code for further clarification. I greatly appreciate any help you can provide!