YosefLab / destvi_utils

Utilities for downstream analysis of destVI
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Proportion thresholding question #18

Closed PauBadiaM closed 1 year ago

PauBadiaM commented 2 years ago

Hi,

Amazing tool and package! I'm getting better results with DestVI than with other deconvolution tools and the implementation and documentation are very clean, however I have a question regarding the thresholding of proportions.

props

I really like the idea of filtering noisy proportions from the data like in this case, it is something I also observe in my data. However, the output of the automatic_proportion_threshold function is just a dictionary of thresholds. Since proportions are dependent on each other (if one changes, all of them change to reach a total sum of 1), if I remove some cell types the rest of proportions will change. What would you recommend? I was thinking of filtering using all thresholds and then recomputing the proportions by dividing each remaining proportion by the sum of the remaining proportions per spot. Is there a better way to do it? Thank you for your time.

canergen commented 2 years ago

Thanks for your question. Indeed, currently we don't rescale after threshold. If you need rescaled versions, I would suggest to rescale after thresholding the proportions for each cell. However, we currently keep the proportions without threshold and use the threshold in downstream functions to allow for sparse results and only take into consideration spots with a certain amount of each cell type. From my side, this makes sense as the automatic threshold is a first guess of the threshold value but you get cleaner DE results by manually adjusting the threshold value (for your example above 0.2 might actually look cleaner). The automatic threshold tends to be to high for highly abundant cell types.

PauBadiaM commented 2 years ago

Thanks for the quick reply @cane11 ! Then, would you recommend to set up the thresholds manually after looking at the distribution of obtained proportions? For instance, in the previous example there is a clear bimodal distribution that can be separated by setting, like you mentioned, the threshold to 0.2. However, I wonder then what happens in these other situations: props Here is not so clear where to cut, what could be a good rule of thumb?

canergen commented 2 years ago

I'm sorry for missing follow-up. My perception up to now is that it is not very critical for performance if you set it to 0.1 or 0.2 for Bcells (it changes p-values but not that much ordering of genes). In the same line, for monocytes here it is not critical whether it is 0.07 or 0.12 (higher threshold impedes statistical analysis as the number of spots gets very low). We currently don't have a good heuristic except taking the automatic one for lowly abundant cell-types and a user-defined for highly abundant cell-types. While user-selected hyper-parameters are the standard for a lot of decisions in sc-analysis (say number of hvg, type of preprocessing), it's unfortunate to not have a better heuristic and we are very open for user idea on better automatic heuristics. Currently the threshold can be selected by biological question: Let's say you are interest in the phenotype of monocytes in Bcell follicles in this dataset. It would be critical to set the threshold very low as the expected fraction of monocytes in this region is very low. This will impede overall performance but should give you some idea on how monocytes might differ in this region.