Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
278 stars 44 forks source link

Deterministic results? #10

Closed bbimber closed 2 years ago

bbimber commented 2 years ago

Hello,

This is admittedly a picky question. We're experimenting with running celltypist to score cells, from the command line. In our test data we have a rare category with ~4 cells. These are consistently scored as Tcm/Naive cytotoxic T cells in 'predicted labels'. However, the result of majority_voting is not deterministic. Some of the time these 4 cells lump into another category. This by itself it not a huge problem (i.e. in reality they are probably ambiguous cells and it's 4 total). My question is about the inconsistency run-to-run. The input we give celltypist does not have a neighborhood graph, etc., and celltypist creates it for us. Are there any instance where we can or should be setting a random seed or something like this? Thanks

ChuanXu1 commented 2 years ago

@bbimber , I don't think there will be stochasticity in the over clustering and downstream majority voting. Can you confirm (paste the codes/steps here maybe)?

bbimber commented 2 years ago

After watching this more (it's our github actions testing that revealed this), it seems like R/OS version is what exposes the inconsistency. For other reasons we're dropped R 4.0. I dont think it's worth either of our time on it at this point. It seems possible some difference in another R package could give consistent differences when running on R 4.0 vs. 4.1+, but I have not attempted to figure out what.