ZxZhou4150 / Redeconve

Deconvolution of spatial transcriptomics at single-cell resolution
MIT License
12 stars 0 forks source link

Reference requirements and thresholding enquiry #4

Closed MscIlaria closed 1 month ago

MscIlaria commented 2 months ago

Dear Team,

Thank you for developing such a great tool.

Are there specific optimal requirements for the single cell reference? i.e. what is the minimum cell_count per cell_type that is required? Is the proportion of cell_types within the reference important? Moreover, do you have advice on what you would look at to decide the thresholding?

Thank you, Ilaria

ZxZhou4150 commented 2 months ago

Hi Ilaria,

Thank you for your recognition of our work! Here are my suggestions to your questions:

  1. Reference requirements: there is NO hard and fast rule for the reference.

1.1 There is NO minimum cell-count per cell-type required. If you want to downsample the reference to about thousands of cells, you can use the function cell.sampling with parameter prot = T, which guarantees at least one cell from each cell type is chosen (see section 2.2 of our manual for details). However, we can't guarantee every sampled cell is occurred in the result, owing to the low capture rate of ST and the sparsity of our result.

1.2 The proportion of cell types is NOT important. In theory, if the scRNA reference and ST data are paired (i.e., from the same tissue of the same patient), the proportion of cell types in sc reference can also reflect the cell type proportions of ST data. In this case, the proportion is informative. But for most of the cases, sc reference and ST are not paired, so the proportion of cell types doesn't matter. In our paper we showed the robustness of Redeconve when using different references, in which the cell type proportions can vary a lot (See fig. 3d and supp fig 21 for details).

  1. Thresholding: there are two kinds of thresholds in the main function. Which one do you mean?

2.1 var_thresh and exp_thresh in the "gene selection" part: these thresholds are for the recognition of highly variable gene. The default value is respectively 0.025 and 0.003. These default values are adapted from another SOTA method, RCTD (or spacexr, see their paper and GitHub page). Typically this will result in ~3000 genes. If you find the number of genes too big or small, you can adjust these values to your will. Also note that gene selection is not a mandatory step. You can use the whole transcriptome for deconvolution to bypass these thresholds, which will not highly affect the running efficiency (See section 1.2 of our manual).

2.2 thre: this threshold is to avoid false-positives. In the primary result of our algorithm, there is no zeros but very small values (e.g., 1e-20). These are not true cell occurrences. This parameter is to filter out these false-positives. Typically you don't need to adjust it. See section 1.6 of the manual.

I hope my explanation can solve your concerns. If you have further questions, do not hesitate to reply in this thread.

Zixiang