Different output between doublets predicted

TomKellyGenetics commented 6 years ago

On an in-house test dataset:

213/2332 doublets are identified.

> labels <- clf$predict()
> sum(labels)
213

However 127/2332 doublets are counted by convergence plot.

> cumulative_doublets <- convergence(clf)
> cumulative_doublets[clf$n_iters]
127

These do not overlap:

                 predict
convergence      0     1
               0 1992  213
               1 127   0

The predict method doublets are more consistent with those from the Python implementation (which identifies 97/2332 cells as doublets):

                  Python 2.3.0
convergence       0      1
                0 2108   97
                1 127    0

                  Python 2.3.0
predict           0      1
                0 2115   4
                1 120    93

The "convergence" plot doublets appear to be computed correctly. Running method predict as a function on all_log_p_values <- clf$all_log_p_values returns the same result. Thus the issue appears to be due to passing variables to these functions as their outputs are the same with test inputs.

Note: computing doublets from log-p-values (log-p <= log(0.01)) or directly from p-values (p < 0.01) returns the same results in R.

TomKellyGenetics commented 6 years ago

May be resolved by updating to current version (with thresholds corresponding to internal use of log-p-values). Testing currently underway.

TomKellyGenetics commented 6 years ago

The outputs of predict and convergence appear to be consistent upon updating the package on the test server.

TomKellyGenetics / DoubletDetection

Different output between doublets predicted #1