Danko-Lab / BayesPrism

A Fully Bayesian Inference of Tumor Microenvironment composition and gene expression
159 stars 46 forks source link

Inaccurate pseudobulk deconvolution with raw counts #105

Open emmanuelmekasha opened 3 weeks ago

emmanuelmekasha commented 3 weeks ago

Hello,

Thank you very much for your package. My group is very interested in using this method in our work for neuroscience discovery. We performed some preliminary deconvolution benchmarks and observed some really weird phenomenons.

Using the ROSMAP brain dataset (scRNA-seq dataset), I generated a pseudobulk matrix by aggregating columns of the same celltype. I provided BayesPRISM with the scRNA matrix as a reference and the pseudobulk matrix as the objective to deconvolute. We found BayesPRISM was pretty inaccurate (~10% RMSE) with this simple deconvolution task while regression methods could predict the correct proportions.

Interestingly, when we provided BayesPRISM with the pseudobulk matrix as generated by Seurat's AggregateExpression method (which involves several normalization steps, including log normalization), BayesPRISM was right on the mark. In general, however, the documentation seems not to recommend normalization.

Help would be greatly appreciated. This seems to be really weird behavior from a robust method. I could elaborate on my code or any specifics.

Thank you.

tinyi commented 1 week ago

Hi Emmanuel,

Thank you for your questions.

I hope to provide some possibilities based on my guesses.

First of all, log transformation should be avoided for all deconvolution methods, this is because log(a+b) ≠ log(a) + log(b). However, if log performs better than the raw count, one possible cause might be the extreme outlier genes. If you have already removed ribosomal and mitochondrial genes (using the function provided by BayesPrism), you can increase the cutoff in filtering outliers when calling the new.prism function.

Could you also provide the cell-type correlation coefficient for results using raw and log scale?

Best.

Tinyi

On Tue, Nov 5, 2024 at 1:39 PM Emmanuel Mekasha @.***> wrote:

Hello,

Thank you very much for your package. My group is very interested in using this method in our work for neuroscience discovery. We performed some preliminary deconvolution benchmarks and observed some really weird phenomenons.

Using the ROSMAP brain dataset (scRNA-seq dataset), I generated a pseudobulk matrix by aggregating columns of the same celltype. I provided BayesPRISM with the scRNA matrix as a reference and the pseudobulk matrix as the objective to deconvolute. We found BayesPRISM was pretty inaccurate (~10% RMSE) with this simple deconvolution task while regression methods could predict the correct proportions.

Interestingly, when we provided BayesPRISM with the pseudobulk matrix as generated by Seurat's AggregateExpression method (which involves several normalization steps, including log normalization), BayesPRISM was right on the mark. In general, however, the documentation seems not to recommend normalization.

Help would be greatly appreciated. This seems to be really weird behavior from a robust method. I could elaborate on my code or any specifics.

Thank you.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS6CUMVHQ7EC5RFUBQLZ7BDYRAVCNFSM6AAAAABRFXUYTWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTINBXGM2TAMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>