MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
40 stars 22 forks source link

mnnCorrect output generates error when calculating TSNE #10

Closed MwrulesC closed 6 years ago

MwrulesC commented 6 years ago

Hi, I don't have much experience with scran, but I've recently been testing the use of mnnCorrect on some of the samples I have. Everything seems to work and I can generate figures up until I try to run TSNE calculations on the generated output. At that point I get the error message: Error in svd(x, nu = 0, nv = k) : a dimension is zero

After some digging around, I haven't quite figured out how to solve this error message in this particular instance. I was just wondering if you've had experience with something like this before, and if I should be doing some extra filtering or normalization pre/post analysis to avoid this.

The general overview of the code I use is the following:

rawData <- Read10X(filepath1) fixed <- SingleCellExperiment(assays = list(counts=as.matrix(rawData)), colData = list(Sample = rep('Fixed',dim(rawData)[2]))) fixed <- computeSumFactors(fixed) fixed <- normalize(fixed)

rawData <- Read10X(filepath2) mixed <- SingleCellExperiment(assays = list(counts=as.matrix(rawData)), colData = list(Sample = rep('Mixed',dim(rawData)[2]))) mixed <- computeSumFactors(mixed) mixed <- normalize(mixed)

out <- mnnCorrect(fixed@assays$data$logcounts, mixed@assays$data$logcounts)

mixed@assays$data$mnnCounts <- out$corrected[[2]] fixed@assays$data$mnnCounts <- out$corrected[[1]] combined <- cbind(fixed, mixed) combined <- runPCA(combined, exprs_values = "mnnCounts") plotReducedDim(combined, use_dimred = "PCA", colour_by = "Sample", shape_by = "Sample")

combined <- runTSNE(combined, exprs_values = "logcounts") plotTSNE(combined, colour_by = "Sample", shape_by = "Sample")

Thanks

LTLA commented 6 years ago

This sounds more like an issue with scater's runTSNE rather than mnnCorrect, especially if you can successfully generate a PCA off the MNN-corrected values. Indeed, in the code above, you're not even running the t-SNE on the MNN-corrected values, but instead the "logcounts" assay.

I'm a bit bemused about how it is possible to have no dimensions in the svd function - that would suggest that you have no cells or no genes within the Rtsne call (itself within runTSNE). Off the top of my head, that would involve all-zero rows, or possibly a dataset full of duplicate cells.

I would suggest running debug(runTSNE), stepping through the function, and investigating the value of vals that is being passed into the Rtsne call. If this has zero dimensions, then it's a fault in runTSNE; if it has non-zero dimensions, then it's a fault in Rtsne.

In addition, here are some more general comments:

  1. The DropletUtils package can directly read 10X results into a SingleCellExperiment with the read10xCounts function, no need for a manual conversion.
  2. I strongly recommend against using @ to get or set data values in a S4 class. This is widely considered to be Bad Practice; use the appropriate getters/setters instead. For example, assay(blah) and assay(blah) <- x.
  3. Always report your session information when posting issues, see sessionInfo(). When using Bioconductor packages, make sure you're using the latest version (BioC 3.7).
MwrulesC commented 6 years ago

I was actually running on the mnnCorrect, but I think since it was giving me an error I didn't add that command to my script that I was copying from.

Thanks for the information, I'll look more into what I get when I try running debug and seeing what my output looks like.

Also thanks for the general comments, I do have a tendency to just try to hack together something that works. I'll also look more into DropletUtils, I had found it earlier but it wasn't installed on the server I was using and so I hadn't tried incorporating it yet.

In case it is still useful, here is what I have for my sessionInfo output: R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] scater_1.6.3 scran_1.6.9 SingleCellExperiment_1.0.0 [4] SummarizedExperiment_1.8.1 DelayedArray_0.4.1 matrixStats_0.53.1
[7] Biobase_2.38.0 GenomicRanges_1.30.3 GenomeInfoDb_1.14.0
[10] IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
[13] BiocParallel_1.12.0 Seurat_2.2.1 Matrix_1.2-12
[16] cowplot_0.9.2 ggplot2_2.2.1

loaded via a namespace (and not attached): [1] shinydashboard_0.6.1 R.utils_2.6.0 tidyselect_0.2.4 RSQLite_2.0
[5] AnnotationDbi_1.40.0 htmlwidgets_1.0 grid_3.4.3 trimcluster_0.1-2
[9] ranger_0.9.0 Rtsne_0.13 devtools_1.13.5 munsell_0.4.3
[13] codetools_0.2-15 ica_1.0-1 DT_0.4 statmod_1.4.30
[17] withr_2.1.2 colorspace_1.3-2 knitr_1.20 rstudioapi_0.7
[21] ROCR_1.0-7 robustbase_0.92-8 dtw_1.18-1 dimRed_0.1.0
[25] lars_1.2 tximport_1.7.14 GenomeInfoDbData_1.0.0 mnormt_1.5-5
[29] bit64_0.9-7 rhdf5_2.22.0 ipred_0.9-6 diptest_0.75-7
[33] R6_2.2.2 ggbeeswarm_0.6.0 VGAM_1.0-5 locfit_1.5-9.1
[37] flexmix_2.3-14 DRR_0.0.3 bitops_1.0-6 assertthat_0.2.0
[41] SDMTools_1.1-221 scales_0.5.0 nnet_7.3-12 beeswarm_0.2.3
[45] gtable_0.2.0 ddalpha_1.3.1.1 timeDate_3043.102 rlang_0.2.0
[49] CVST_0.2-1 scatterplot3d_0.3-41 RcppRoll_0.2.2 splines_3.4.3
[53] lazyeval_0.2.1 ModelMetrics_1.1.0 acepack_1.4.1 broom_0.4.3
[57] checkmate_1.8.5 yaml_2.1.18 reshape2_1.4.3 backports_1.1.2
[61] httpuv_1.3.6.2 Hmisc_4.1-1 caret_6.0-78 tools_3.4.3
[65] lava_1.6 psych_1.7.8 gplots_3.0.1 RColorBrewer_1.1-2
[69] proxy_0.4-21 dynamicTreeCut_1.63-1 ggridges_0.4.1 Rcpp_0.12.16
[73] plyr_1.8.4 base64enc_0.1-3 progress_1.1.2 zlibbioc_1.24.0
[77] purrr_0.2.4 RCurl_1.95-4.10 prettyunits_1.0.2 rpart_4.1-13
[81] pbapply_1.3-4 viridis_0.5.0 zoo_1.8-1 sfsmisc_1.1-2
[85] cluster_2.0.6 magrittr_1.5 data.table_1.10.4-3 mvtnorm_1.0-7
[89] mime_0.5 xtable_1.8-2 XML_3.98-1.10 mclust_5.4
[93] gridExtra_2.3 compiler_3.4.3 biomaRt_2.34.2 tibble_1.4.2
[97] KernSmooth_2.23-15 R.oo_1.21.0 htmltools_0.3.6 segmented_0.5-3.0
[101] Formula_1.2-2 tidyr_0.8.0 tclust_1.3-1 lubridate_1.7.3
[105] DBI_0.8 diffusionMap_1.1-0 MASS_7.3-49 fpc_2.1-11
[109] R.methodsS3_1.7.1 gdata_2.18.0 metap_0.8 bindr_0.1.1
[113] gower_0.1.2 igraph_1.2.1 pkgconfig_2.0.1 sn_1.5-1
[117] numDeriv_2016.8-1 foreign_0.8-69 recipes_0.1.2 foreach_1.4.4
[121] vipor_0.4.5 XVector_0.18.0 prodlim_1.6.1 stringr_1.3.0
[125] digest_0.6.15 tsne_0.1-3 htmlTable_1.11.2 edgeR_3.20.9
[129] kernlab_0.9-25 shiny_1.0.5 gtools_3.5.0 modeltools_0.2-21
[133] rjson_0.2.15 nlme_3.1-131.1 bindrcpp_0.2 viridisLite_0.3.0
[137] limma_3.34.9 pillar_1.2.1 lattice_0.20-35 httr_1.3.1
[141] DEoptimR_1.0-8 survival_2.41-3 glue_1.2.0 FNN_1.1
[145] prabclus_2.2-6 iterators_1.0.9 bit_1.1-12 class_7.3-14
[149] stringi_1.1.7 mixtools_1.1.0 blob_1.1.0 latticeExtra_0.6-28
[153] caTools_1.17.1 memoise_1.1.0 dplyr_0.7.4 irlba_2.3.2
[157] ape_5.0

LTLA commented 6 years ago

The Bioconductor packages you're using are from BioC 3.6. The latest version of Bioconductor is 3.7, running on R 3.5.0. I suggest you try upgrading R and the associated packages, to see whether the behaviour persists. Indeed, even if there is a bug in one of my packages, I will only be able to implement a bug fix to the latest release version, so you would have to upgrade to take advantage of it anyway.

MwrulesC commented 6 years ago

After installing R 3.5.0 and the associated packages, it seems like it has fixed the problem. Thanks for all your help.

An unrelated question, but I was just wondering. Let's say I have the situation where I have 3 samples. Sample 1 is composed of a cells from a normal cell line, sample 2 is composed of cells from a tumor cell line, and sample 3 is composed of cells from a mixture of the two. Is there a way for me to apply mnnCorrect if I observe batch effects?

Thanks!

LTLA commented 6 years ago

Yes, that's the whole raison d'etre of mnnCorrect. As long as there are shared populations across batches, then the correction can be performed. However, the order matters as it determines how mutual nearest neighbours are identified across batches. Samples 1 and 2 do not share any cell types, so there should not be any (sensible) MNNs between them. This suggests that you should correct 1 and 3 first, before correcting 2; or vice versa, i.e., 2 and 3 and then 1.

MwrulesC commented 6 years ago

Thanks for the information. Just to double check, would this be the order in which I list the variables when running mnnCorrect, or should I run mnnCorrect to times do this type of correction?

LTLA commented 6 years ago

The former; the order in which you pass batches to the ... argument.

MwrulesC commented 6 years ago

Thanks for the quick response. I'll give it a try, and hopefully I'll get it to work properly.

Thanks for all your help resolving my previous problems.