Irrationone / cellassign

Automated, probabilistic assignment of cell types in scRNA-seq data
Other
192 stars 82 forks source link

About results reproducibility #35

Closed RichardLZQ closed 4 years ago

RichardLZQ commented 5 years ago

Dear Cellassign team,

Thanks for your great work. Cellassign solved our problem to identify cell subtypes successfully.

However, we found some small problems when we try to reproduce cellassign results.

  1. We used Cellassign 0.99.0 and Tensorflow 1.8.0 and there were 10% cells assigned to different types in 3 times test.(same code)

  2. Recently, we upgrade the Tensorflow to 1.9.0 and also added AVX2 and FMA(our CPU support these two extensions and tensorflow always warning this). We also use the latest Cellassign 0.99.2. However, the proportion of unstable cell raised to 99%, only 1% cells were assigned to same type during 3 times test.

Since I'm still a beginner of tensorflow and python. It's really not easy to do debugging by myself.

Could you please give me some suggestions?

Many thanks,

Richard the Chinese University of Hong Kong

kieranrcampbell commented 5 years ago

Hi Richard,

Thanks for your feedback.

CellAssign uses expectation maximization, which will essentially only go to a local maximum in the log likelihood space. So it actually performs multiple fits with random initializations and chooses the best based on the log likelihood.

A few questions questions:

  1. Are you setting a seed when doing the analysis?
  2. Are you using the latest version of CellAssign on github?
  3. The cells that change assignment between runs - what does the probability of assignment look like, ie if you just take highly confidently assigned cells does this issue go away?

Thanks

Kieran

robertamezquita commented 5 years ago

I've also run into this reproducibility issue, in that running on the same data set, and setting the seed (both in R and via reticulate's py_set_seed function) do not ensure reproducibility between runs.

This seems to be occuring more so for cell types that are more easily confused in my case (see confusion matrix below), but regardless, the runs should be reproducible upon subsequent runs if nothing has changed.

For reference, here is how I ran cellassign:

library(cellassign)
library(tensorflow)

set.seed(1234)                # set R rng seed
reticulate::py_set_seed(1234) # set python rng seed
fit <- cellassign(sce[rownames(rho), ],
                  marker_gene_info = rho,
                  s = sizeFactors(sce))

Any tips would be appreciated ~

               B cell CD4 T cell CD8 T cell Dendritic cell Monocyte NK cell
  B cell            179          0          9              0        0       0
  CD4 T cell          3          2        971              0        0       0
  CD8 T cell          0          0        136              0        0       0
  Dendritic cell      0          0          0             17        1       0
  Monocyte            0          0         43              0      416       0
  NK cell             0          0         53              0        0     192
kieranrcampbell commented 5 years ago

Thanks @robertamezquita

I think the seed needs set in the tensorflow session, rather than in python. If I remember from clonealign this is a bit painful, but I'll make it a priority since it's important. On a side note, if the results are (largely) dependent on the seed, it means the likelihood surface is so bumpy I wouldn't trust them. You can use cellassign to perform multiple runs from different starting initializations and select the one with the highest log likelihood using the num_runs option

kieranrcampbell commented 4 years ago

Hi both,

Sorry I thought I'd addressed this but clearly hadn't. This is now fixed in fcf161d . In particular, it gets a new seed simply by calling sample which is then passed to the Tensorflow session, so set.seed should work with cellassign. Let me know if any further problems.

olechnwin commented 4 years ago

Hi, Sorry for the stupid question. Can you please let me know how do I install the cellassign version that has the seed setting fixed?

kieranrcampbell commented 4 years ago

No stupid questions here -- if you reinstall cellassign with devtools::install_github ensuring version 0.99.16 you should be able to do

set.seed(123)

cellassign(sce, etc)

please let me know if it doesn't work!