Large difference in two computations of fateSelectionTest()

Alexis-Varin commented 1 month ago

Hello, thank you for your package ! I have a problem regarding fateSelectionTest()

I have run the function with the exact same parameters twice in the same R session one after the other and got completely different results on my sce object with 7 lineages and 2 conditions : fate = fateSelectionTest(sds, sds$treatment, pairwise = T)

First run

pair statistic p.value 1 All 0.5504616 9.103700e-20 2 1vs2 0.5195933 1.882083e-03 3 1vs3 0.4877802 9.647713e-01 4 1vs4 0.5024856 3.375038e-01 5 1vs5 0.5025000 3.482903e-01 6 1vs6 0.5050421 2.056162e-01 7 1vs7 0.5116028 3.958415e-02 8 2vs3 0.5015237 4.132781e-01 9 2vs4 0.5297462 1.967655e-06 10 2vs5 0.5149250 1.474849e-02 11 2vs6 0.5298730 3.332745e-06 12 2vs7 0.4954645 7.442753e-01 13 3vs4 0.5068593 1.407236e-01 14 3vs5 0.4996844 5.160231e-01 15 3vs6 0.5118422 3.524381e-02 16 3vs7 0.5104678 6.759537e-02 17 4vs5 0.5180069 3.073451e-03 18 4vs6 0.5218328 2.944633e-04 19 4vs7 0.5141452 1.031571e-02 20 5vs6 0.5232433 1.765088e-04 21 5vs7 0.5024543 3.543140e-01 22 6vs7 0.5092429 7.085858e-02

Second run, immediately after, and completing in about 30 minutes just like the first one

pair statistic p.value 1 All 0.5589915 2.503690e-26 2 1vs2 0.5088816 9.560440e-02 3 1vs3 0.5012838 4.298301e-01 4 1vs4 0.5096613 5.290777e-02 5 1vs5 0.4871519 9.798393e-01 6 1vs6 0.5119615 2.557943e-02 7 1vs7 0.5149129 1.199845e-02 8 2vs3 0.5153948 1.707353e-02 9 2vs4 0.5317502 4.213494e-07 10 2vs5 0.5209220 1.132124e-03 11 2vs6 0.5201694 1.174675e-03 12 2vs7 0.5161514 1.057750e-02 13 3vs4 0.5164933 4.690563e-03 14 3vs5 0.5070373 1.450384e-01 15 3vs6 0.5052387 2.136720e-01 16 3vs7 0.5077388 1.350167e-01 17 4vs5 0.5039175 2.776432e-01 18 4vs6 0.5330868 9.283741e-08 19 4vs7 0.5121956 2.295559e-02 20 5vs6 0.5266183 2.130505e-05 21 5vs7 0.5009598 4.436396e-01 22 6vs7 0.5059306 1.731157e-01

I was wondering if I need to set a seed ? But how could the results be so different ? There are up to 4 orders of magnitude differences on the p-value of some pairs (and 2vs6 is the one that interests me the most, and the second one changing the most...)

Also, how do I interpret the results ? What does the statistic represent and the p-value ? How would I need to show the results in a figure (bar plot ? Scatter ?). I am especially interested in knowing if when treated the differentiation pattern change from one prefered lineage in control to another one in treated, I can plot the curveweights vs density as you show in your tutorials but knowing how to represent also the results of fateSelectionTest() would help me a lot :)

Thanks

HectorRDB commented 4 weeks ago

Hi, As described in the paper, the default test for the fateSelectionTest is a classifier test. That means that split the cells into a test and train set, a classifier (by default random forest) is trained, with each cell represented by a vector of lineage weights. We then measure the classifier's accuracy on the test set and compare it with what is expected under the null (random attribution if there is no difference). So the statistics is the accuracy of that classifier. It can vary run to run as both the random forest and the train-test splits are random. Because the null model is a binomial, small variations in the tails lead to large differences in p-value. I would refer you to the discussion of the condiment paper.

We stress that, rather than attaching strong probabilistic interpretations to p-values (which, as in most RNA-Seq applications, would involve a variety of hard-to-verify assumptions and would not necessarily add much value to the analysis), we view the p-values produced by the condiments workflow as useful numerical summaries for guiding the decision to fit a common trajectory or condition-specific trajectories and for exploring trajectories across conditions and identifying genes for further inspection.

Alexis-Varin commented 4 weeks ago

I see, thanks. I have many more questions regarding my analysis, but would perhaps close this one and maybe open a new one as this does not relate to fate selection.

HectorRDB / condiments

Large difference in two computations of fateSelectionTest() #33