bulk RNA deconvolution - Githubissues

HanyingYan commented 1 month ago

Hi team,

In the Redeconve manual.ipynb, I noticed that you mentioned Redeconve can also be used for bulk RNA-seq deconvolution. Does this mean we can use scRNA as reference to deconvolute bulk RNA data instead of ST?

If so, do you have any sample codes to run this kind of deconvolution? Because it looks to me that the deconvoluting() function only accepts parameters 'ref' and 'st'.

Thanks, Hanying

ZxZhou4150 commented 1 month ago

Hi Hanying,

Redeconve does not require spatial information when performing deconvolution, this guarantees its eligibility of using parallel computation and thus efficiency. Therefore, in priciple, it can be used for bulk RNA-seq deconvolution as well. All you need to do is to substitute the st matrix, which is supposed to be a gene-by-spot matrix, to your gene-by-sample matrix from bulk RNA-seq, and remain other parts as the same.

Hope this will help you. Feel free to ask if you have further questions.

Zixiang

HanyingYan commented 3 weeks ago

Hi Zixiang,

Thank you for the instruction. I was able to apply deconvoluting() on my bulk RNA-seq data by replace st with bulk matrix. However, I still have several questions.

I replicated your your code using sample data as in Redeconve manual.ipynb. However, I noticed that the deconvolution results res.ctmerge and res.ct (using get.ref()) are somewhat different. Do you think the previous one is more accurate and we should stick to it when reference is not very large?
For parameter normalize, you mentioned that Redeconve can also be used for bulk RNA-seq deconvolution. When doing this, normalization for reference is not required. When deconvoluting spatial transcriptomics, normalization is recommended. Can you explain why? Both mybulkandsc` and count-level, if I set normalize=F, the proportion results for every cell types in the annotation(N) with be generally around 1/N. However, if I set normalize=T, the results are more reasonable/variant.
Do you have any advice on the values for other parameters such as var_thresh and exp_thresh for bulk RNA data deconvolution? How do you evaluate the results with different hyperparameters?

Thanks, Hanying

ZxZhou4150 commented 3 weeks ago

Hi Hanying,

You are raising very valuable questions. Here are my answers or suggestions:

They are of course different if you consider how we get them. For res.ct, we first get the average expression of all cells in a cell type through get.ref(), then this averaged expression profile is used to get the abundance. For res.ctmerge, we first get the abundance of each cell with its unique expression profile, then the abundance are merged. These results are the same only when the abundance of each cell in a cell type is exactly identical. In short res.ctmerge is more accurate and res.ct is an alternative when computing resources are insufficient.
This is a very good question. Our initial thought about normalization was: the scales of bulk, sc and st RNA-seq data are very different (specifically, the capture rate decreases from bulk to sc to st while the sparsity is the opposite), so we are not sure if the same normalization strategy in st deconvolution with sc can be used on bulk. You case offers a good example showing that the normalization strategy is generic. We consider canceling the suggestion of not doing normalization with bulk deconvolution.
1 This is also a good question. var_thresh can remain the same because it's about variance of reference. exp_thresh, which considers the minimum total count of a gene can be much larger than 0.003 (default) for you're using bulk, not st. We don't have a suggestion on the specific value, but you can try with a set of numbers to see which one will result in a satisfying number of genes.
2 You can refer to "Determination of the hyperparameter" section in the "Methods" part of our paper for a detailed explantation of how we selecte the hyperparameter through autoselection mode. Basically it is by the residue of the LS term.

Feel free to ask if you have further questions.

Zixiang

HanyingYan commented 2 weeks ago

Hi Zixiang,

For res.ctmerge and res.ct, can I always runto.proportion() to get proportions? Because for each column(spot or bulkRNA sample) the values don't sum up to 1. And when you visualize the results using spatial.piechart(res.ctmerge, coords), it seems that you are using the proportions.
I see. I will always set normalize=T then.
It seems that exp_thresh is 0.03 by default, not 0.003. Can you double check? Also, I noticed that in your sample code, you used res <- deconvoluting(sc, st, genemode = **"def"**, hpmode = "def", dopar = T, ncores = 8), but in the explanation, you mentioned the 3 modes are 'default', 'customized' and 'filtered'. So I wonder if it can recognize 'def' as 'default' and use all genes instead of filtering with default thresholds.
I was using default but I will also try autoselection to see how it goes!

Thanks, Hanying

ZxZhou4150 commented 2 weeks ago

For we normalized the data, the result directly get by deconvoluting only has relative significance, and that's why we need the "Gaining interpretability" section. to.proportion is always a good choice for visualization and downstream analyses. You can also use to.absolute.abundance if you have some prior knowledge about how many cells there are approximately in one st spot. spatial.piechart will automatically normalize the results so that the sum is 1 for each column, per the requirement of piechart.
Thanks so much for pointing that out! It's our neligence. The default value should be 0.003. In function gene.filter which actually filters the genes this value is 0.003, but we mistakenly set it to 0.03 in the function deconvoluting. We will correct that. For def and default, the built-in function in R match.arg can match your input to a set of candidate values.

ZxZhou4150 / Redeconve

bulk RNA deconvolution #6