Vivianstats / scImpute

Accurate and robust imputation of scRNA-seq data
https://www.nature.com/articles/s41467-018-03405-7
90 stars 34 forks source link

No imputation with Kcluster = 1 #9

Closed fbrundu closed 6 years ago

fbrundu commented 6 years ago

Dear Vivian, I am running scImpute on the 293T dataset, that should be the same used in the scImpute paper. Using k = 1 (cells are clustered with the same cell type), scImpute does not impute any value. You may see in the attached image that the percentiles are the same for the raw and imputed datasets. Is this the correct behavior?

Thanks, Francesco

scimpute_k1_perc

Vivianstats commented 6 years ago

Hello Francesco,

I wonder if you directly compare the raw count matrix and the imputed count matrix. Are the counts all the same for each entry? If that's the case, please report it here and I will investigate why the method does not work.

Thanks, Vivian

dmoaks commented 6 years ago

Hi Vivian,

I'm a colleague of Francesco---we tried imputing with k=1 and there was not difference from the input array. Should values still be imputed if k=1?

Vivianstats commented 6 years ago

Hello dmoaks,

Yes if you set Kcluster = 1, scImpute still tries to impute the gene expression. I tested the package on multiple datasets and was able to get imputed results with Kcluster = 1.

If you have verified that you are using the latest release and all the imputed values are exactly the same as the raw expression, can you send me a smaller test dataset to diagnose the problem?

Thanks, Vivian

dmoaks commented 6 years ago

preimpute_1Krandom.txt

Hi Vivian,

I've tried with several data sets and am still getting no change when kcluster=1. I have the most current version of scImpute.

Thanks, Dan

Vivianstats commented 6 years ago

Hello Dan,

I just updated the package and it now should work on your data. Thanks very much for your feedback and please let me know if you have further questions.

dmoaks commented 6 years ago

Hi Vivian,

Thanks for the quick responses. I updated scImpute and tried again with k=1 with still no changes from pre-imputation. Here are my session details:

scimpute(# full path to raw count matrix

  • count_path = "preimpute_1Krandom.txt",
  • infile = "txt", # format of input file
  • outfile = "txt", # format of output file
  • out_dir = "./", # full path to output directory
  • labeled = FALSE, # cell type labels not available
  • drop_thre = 0.9, # threshold set on dropout probability
  • Kcluster = 1, # 2 cell subpopulations
  • ncores = 1) # number of cores used in parallel computation [1] "reading in raw count matrix ..." [1] "number of genes in raw count matrix 1000" [1] "number of cells in raw count matrix 1000" [1] "estimating dropout probability for type 1 ..." [1] "imputing dropout values for type 1 ..." [1] "writing imputed count matrix ..." integer(0) sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] scImpute_0.0.5 doParallel_1.0.11 iterators_1.0.9
[4] foreach_1.4.4 penalized_0.9-50 survival_2.41-3
[7] kernlab_0.9-25

loaded via a namespace (and not attached): [1] Rcpp_0.12.15 knitr_1.18 devtools_1.13.4 splines_3.4.3
[5] munsell_0.4.3 colorspace_1.3-2 lattice_0.20-35 R6_2.2.2
[9] rlang_0.1.6 httr_1.3.1 plyr_1.8.4 tools_3.4.3
[13] grid_3.4.3 gtable_0.2.0 git2r_0.20.0 withr_2.1.1
[17] lazyeval_0.2.1 digest_0.6.13 tibble_1.4.1 Matrix_1.2-12
[21] ggplot2_2.2.1 codetools_0.2-15 curl_3.1 memoise_1.1.0
[25] compiler_3.4.3 pillar_1.0.1 scales_0.5.0

Vivianstats commented 6 years ago

Hello Dan,

That's surprising. The printed messages look correct, so I would suggest that you first make sure the newest package is successfully installed. Also, please check if you are loading the correct version of input and output.

I'm attaching the code I used for testing here:

` rm(list = ls()) library(scImpute)

count_path = "./preimpute_1Krandom.txt"

K = 1 drop_thre = 0.5 ncores = 30 out_dir = "./" dir.create(out_dir)

scimpute(count_path, infile = "txt", outfile = "txt", out_dir, labeled = FALSE, drop_thre = 0.5, Kcluster = K, ncores = ncores)

count = read.table(count_path, row.names = 1, header = TRUE) imp_count = read.table("./scimpute_count.txt", header = TRUE, row.names = 1) sum(abs(count - imp_count)) `

This scatterplot (log scale) shows that imputation is working. K1.pdf

I'm also attaching the session info of R. I do notice that we use different versions of R but I think this is not supposed to the cause.

R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS

Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] scImpute_0.0.5 doParallel_1.0.11 iterators_1.0.8 foreach_1.4.3
[5] penalized_0.9-50 survival_2.41-3 kernlab_0.9-25

loaded via a namespace (and not attached): [1] compiler_3.4.1 Matrix_1.2-11 Rcpp_0.12.13 codetools_0.2-15 [5] splines_3.4.1 grid_3.4.1 lattice_0.20-35

fbrundu commented 6 years ago

I can confirm that now imputation with k = 1 is working on my side, we can close.