MarioniLab / FurtherMNN2018

Code for further development of the mutual nearest neighbours batch correction method, as implemented in the batchelor package.
22 stars 6 forks source link

Data downloading #10

Closed wenbostar closed 5 years ago

wenbostar commented 5 years ago

I got the following error when I tried to run "haematopoiesis/prepareData.R":

adding rname 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81682/suppl/GSE81682_HTSeq_counts.txt.gz'
Error in bfcrpath(bfc, "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81682/suppl/GSE81682_HTSeq_counts.txt.gz") :
  not all 'rnames' found or unique.
Calls: bfcrpath -> bfcrpath

How can I fix this problem?

LTLA commented 5 years ago

Session info?

wenbostar commented 5 years ago

For haematopoiesis dataset, I just tried again and the error was gone. But for pancreas dataset, I got the following error:

adding rname 'https://jmlab-gitlab.cruk.cam.ac.uk/publications/MNN2017-DataFiles/raw/4d649e8865cb2b924b61f6bd3f908865dfe0f560/GSE86473//GSE86473_experimental_design.tsv'
  |======================================================================| 100%

Error in bfcrpath(bfc, file.path(host.path, "GSE86473_experimental_design.tsv")) :
  not all 'rnames' found or unique.
1: In isOutlier(sce.gse81076$scater_qc$feature_control_ERCC$pct_counts,  :
  missing values ignored during outlier detection
2: download failed
  web resource path: ‘https://jmlab-gitlab.cruk.cam.ac.uk/publications/MNN2017-DataFiles/raw/4d649e8865cb2b924b61f6bd3f908865dfe0f560/GSE86473//GSE86473_experimental_design.tsv’
  local file path: ‘/Users/wb/Library/Caches/BiocFileCache/b53561eff5e_GSE86473_experimental_design.tsv’
  reason: Not Found (HTTP 404).
3: bfcadd() failed; resource removed
  rid: BFC20
  fpath: ‘https://jmlab-gitlab.cruk.cam.ac.uk/publications/MNN2017-DataFiles/raw/4d649e8865cb2b924b61f6bd3f908865dfe0f560/GSE86473//GSE86473_experimental_design.tsv’
  reason: download failed
4: In value[[3L]](cond) :
trying to add rname 'https://jmlab-gitlab.cruk.cam.ac.uk/publications/MNN2017-DataFiles/raw/4d649e8865cb2b924b61f6bd3f908865dfe0f560/GSE86473//GSE86473_experimental_design.tsv' produced error:
  bfcadd() failed; see warnings()
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] BiocFileCache_1.6.0         dbplyr_1.3.0
 [3] scater_1.10.1               ggplot2_3.1.0
 [5] biomaRt_2.38.0              scran_1.10.2
 [7] SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0
 [9] DelayedArray_0.8.0          matrixStats_0.54.0
[11] Biobase_2.42.0              GenomicRanges_1.34.0
[13] GenomeInfoDb_1.18.2         IRanges_2.16.0
[15] S4Vectors_0.20.1            BiocGenerics_0.28.0
[17] BiocParallel_1.16.6

loaded via a namespace (and not attached):
 [1] viridis_0.5.1            httr_1.4.0               dynamicTreeCut_1.63-1
 [4] edgeR_3.24.3             bit64_0.9-7              viridisLite_0.3.0
 [7] DelayedMatrixStats_1.4.0 assertthat_0.2.0         statmod_1.4.30
[10] blob_1.1.1               GenomeInfoDbData_1.2.0   vipor_0.4.5
[13] progress_1.2.0           pillar_1.3.1             RSQLite_2.1.1
[16] lattice_0.20-38          glue_1.3.0               limma_3.38.3
[19] digest_0.6.18            XVector_0.22.0           colorspace_1.4-0
[22] Matrix_1.2-15            plyr_1.8.4               XML_3.98-1.17
[25] pkgconfig_2.0.2          zlibbioc_1.28.0          purrr_0.3.0
[28] scales_1.0.0             HDF5Array_1.10.1         tibble_2.0.1
[31] withr_2.1.2              lazyeval_0.2.1           magrittr_1.5
[34] crayon_1.3.4             memoise_1.1.0            beeswarm_0.2.3
[37] tools_3.5.1              prettyunits_1.0.2        hms_0.4.2
[40] stringr_1.4.0            Rhdf5lib_1.4.2           munsell_0.5.0
[43] locfit_1.5-9.1           irlba_2.3.3              AnnotationDbi_1.44.0
[46] compiler_3.5.1           rlang_0.3.1              rhdf5_2.26.2
[49] grid_3.5.1               RCurl_1.95-4.11          BiocNeighbors_1.0.0
[52] rappdirs_0.3.1           igraph_1.2.4             bitops_1.0-6
[55] gtable_0.2.0             curl_3.3                 DBI_1.0.0
[58] reshape2_1.4.3           R6_2.4.0                 gridExtra_2.3
[61] dplyr_0.8.0.1            bit_1.1-14               stringi_1.3.1
[64] ggbeeswarm_0.6.0         Rcpp_1.0.0               tidyselect_0.2.5
LTLA commented 5 years ago

Yes, that will be fixed later today. I was going to update these scripts anyway, so you're in luck.

wenbostar commented 5 years ago

It's great. Thanks.

LTLA commented 5 years ago

Should be fixed by 4f00208544d8c07730134a98a05a3288ade0869d. Be aware that the corresponding correction script hasn't been updated, so YMMV until I get around to it tomorrow.

wenbostar commented 5 years ago

I got the following error for your new version:

setwd("FurtherMNN2018/haematopoiesis")
library(rmarkdown)
render("prepare.Rmd")

label: unnamed-chunk-8
  |............................                                       |  42%
  ordinary text without R code

  |..............................                                     |  45%
label: unnamed-chunk-9

Attaching package: 'BiocSingular'

The following object is masked from 'package:scater':

    runPCA

Quitting from lines 94-98 (prepare.Rmd)
Error in cutreeDynamic(htree, minClusterSize = min.size, distM = as.matrix(distM),  :
  unused arguments (use.ranks = FALSE, BSPARAM = IrlbaParam())
Calls: render ... quickCluster -> quickCluster -> .local -> .quick_cluster -> unname
LTLA commented 5 years ago

If you're running a Bioc-devel build, then scran was just updated last night - 1.1.20.

wenbostar commented 5 years ago

Still got error after I updated scran to 1.11.20:

library(rmarkdown)
render("prepare.Rmd")

  |...........................                                        |  40%
label: unnamed-chunk-8
  |............................                                       |  42%
  ordinary text without R code

  |..............................                                     |  45%
label: unnamed-chunk-9

Attaching package: 'BiocSingular'

The following object is masked from 'package:scater':

    runPCA

Quitting from lines 94-98 (prepare.Rmd)
Error in .local(x, ...) : unused argument (BSPARAM = IrlbaParam())
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

Matrix products: default
BLAS: /home/test/software/R/3/lib64/R/lib/libRblas.so
LAPACK: /home/test/software/R/3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] BiocSingular_0.99.12        scran_1.11.20
 [3] scater_1.10.1               ggplot2_3.1.0
 [5] SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0
 [7] DelayedArray_0.8.0          BiocParallel_1.16.6
 [9] matrixStats_0.54.0          Biobase_2.42.0
[11] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2
[13] IRanges_2.16.0              S4Vectors_0.20.1
[15] BiocGenerics_0.28.0         BiocFileCache_1.6.0
[17] dbplyr_1.3.0                BiocStyle_2.10.0
[19] rmarkdown_1.11

loaded via a namespace (and not attached):
 [1] dynamicTreeCut_1.63-1    viridis_0.5.1            httr_1.4.0
 [4] edgeR_3.24.3             bit64_0.9-7              viridisLite_0.3.0
 [7] DelayedMatrixStats_1.4.0 assertthat_0.2.0         statmod_1.4.30
[10] BiocManager_1.30.4       blob_1.1.1               GenomeInfoDbData_1.2.0
[13] vipor_0.4.5              yaml_2.2.0               pillar_1.3.1
[16] RSQLite_2.1.1            lattice_0.20-38          limma_3.38.3
[19] glue_1.3.0               digest_0.6.18            XVector_0.22.0
[22] colorspace_1.4-0         htmltools_0.3.6          Matrix_1.2-15
[25] plyr_1.8.4               pkgconfig_2.0.2          bookdown_0.9
[28] zlibbioc_1.28.0          purrr_0.3.1              scales_1.0.0
[31] HDF5Array_1.10.1         tibble_2.0.1             withr_2.1.2
[34] lazyeval_0.2.1           magrittr_1.5             crayon_1.3.4
[37] memoise_1.1.0            evaluate_0.13            beeswarm_0.2.3
[40] tools_3.5.1              stringr_1.4.0            Rhdf5lib_1.4.2
[43] locfit_1.5-9.1           munsell_0.5.0            irlba_2.3.3
[46] compiler_3.5.1           rsvd_1.0.0               rlang_0.3.1
[49] rhdf5_2.26.2             grid_3.5.1               RCurl_1.95-4.12
[52] BiocNeighbors_1.1.12     rappdirs_0.3.1           igraph_1.2.4
[55] bitops_1.0-6             gtable_0.2.0             DBI_1.0.0
[58] curl_3.3                 reshape2_1.4.3           R6_2.4.0
[61] gridExtra_2.3            knitr_1.21               dplyr_0.8.0.1
[64] bit_1.1-14               stringi_1.3.1            ggbeeswarm_0.6.0
[67] Rcpp_1.0.0               tidyselect_0.2.5         xfun_0.5
>
LTLA commented 5 years ago

You're using a crazy mix of Bioc-devel and Bioc-release packages. Get BiocManager::valid() and try it again. This will require R-devel - a bit annoying, but all of these packages are under active development.

wenbostar commented 5 years ago

When I tried to install the version of scran in Bioc release channel, I got the following error. This is why I installed the version in Bioc devel channel.

BiocManager::install("scran")

** R
** inst
** byte-compile and prepare package for lazy loading
Error : object ‘buildNNIndex’ is not exported by 'namespace:BiocNeighbors'
ERROR: lazy loading failed for package ‘scran’
* removing ‘/home/test/software/R/3/lib64/R/library/scran’
* restoring previous ‘/home/test/software/R/3/lib64/R/library/scran’

The downloaded source packages are in
    ‘/tmp/RtmpzHeWUJ/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages(pkgs = doing, lib = lib, repos = repos, ...) :
  installation of package ‘scran’ had non-zero exit status
LTLA commented 5 years ago

All of these problems arise from the same underlying source; you are mixing release and devel pacake versions and thus your installation is not BiocManager::valid(). The Bioc-devel version of scran requires the Bioc-devel version of BiocNeighbors and so on. This applies to other packages as well, e.g., S4Vectors, BiocParallel. You are making life difficult for yourself (and anyone trying to help you) by manually mixing release and devel versions together. So:

  1. Install R-devel.
  2. Run install.packages("BiocManager").
  3. Run BiocManager::install("scran").
wenbostar commented 5 years ago

I'm using R 3.5.1. Does FurtherMNN2018 only work well under R-devel?

LTLA commented 5 years ago

The FurtherMNN2018 code itself doesn't care what version of R you're on. But batchelor was written in mind for submission to Bioconductor 3.9, which uses R-devel (to be R 3.6 later this year).

wenbostar commented 5 years ago

I just followed your suggestion to have the R-devel installed. However, I got the same error as show below:


library(rmarkdown)
render("prepare.Rmd")

label: unnamed-chunk-8
  |............................                                       |  42%
  ordinary text without R code

  |..............................                                     |  45%
label: unnamed-chunk-9
Quitting from lines 94-98 (prepare.Rmd)
Error in .local(x, ...) : unused argument (BSPARAM = IrlbaParam())

> sessionInfo()
R Under development (unstable) (2019-03-03 r76192)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

Matrix products: default
BLAS: /home/test/software/R/dev/lib64/R/lib/libRblas.so
LAPACK: /home/test/software/R/dev/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] BiocSingular_0.99.12        scran_1.11.20
 [3] scater_1.11.12              ggplot2_3.1.0
 [5] SingleCellExperiment_1.5.2  SummarizedExperiment_1.13.0
 [7] DelayedArray_0.9.8          BiocParallel_1.17.17
 [9] matrixStats_0.54.0          Biobase_2.43.1
[11] GenomicRanges_1.35.1        GenomeInfoDb_1.19.2
[13] IRanges_2.17.4              S4Vectors_0.21.10
[15] BiocGenerics_0.29.1         BiocFileCache_1.7.0
[17] dbplyr_1.3.0                BiocStyle_2.11.0
[19] rmarkdown_1.11

loaded via a namespace (and not attached):
 [1] viridis_0.5.1            httr_1.4.0               dynamicTreeCut_1.63-1
 [4] edgeR_3.25.3             bit64_0.9-7              viridisLite_0.3.0
 [7] DelayedMatrixStats_1.5.2 assertthat_0.2.0         statmod_1.4.30
[10] BiocManager_1.30.4       blob_1.1.1               GenomeInfoDbData_1.2.0
[13] vipor_0.4.5              yaml_2.2.0               pillar_1.3.1
[16] RSQLite_2.1.1            lattice_0.20-38          limma_3.39.12
[19] glue_1.3.0               digest_0.6.18            XVector_0.23.0
[22] colorspace_1.4-0         htmltools_0.3.6          Matrix_1.2-15
[25] plyr_1.8.4               pkgconfig_2.0.2          bookdown_0.9
[28] zlibbioc_1.29.0          purrr_0.3.1              scales_1.0.0
[31] tibble_2.0.1             withr_2.1.2              lazyeval_0.2.1
[34] magrittr_1.5             crayon_1.3.4             memoise_1.1.0
[37] evaluate_0.13            beeswarm_0.2.3           tools_3.6.0
[40] stringr_1.4.0            locfit_1.5-9.1           munsell_0.5.0
[43] irlba_2.3.3              compiler_3.6.0           rsvd_1.0.0
[46] rlang_0.3.1              grid_3.6.0               RCurl_1.95-4.12
[49] BiocNeighbors_1.1.12     rappdirs_0.3.1           igraph_1.2.4
[52] bitops_1.0-6             gtable_0.2.0             DBI_1.0.0
[55] curl_3.3                 R6_2.4.0                 gridExtra_2.3
[58] knitr_1.21               dplyr_0.8.0.1            bit_1.1-14
[61] stringi_1.3.1            ggbeeswarm_0.6.0         Rcpp_1.0.0
[64] tidyselect_0.2.5         xfun_0.5
>
LTLA commented 5 years ago

I don't know why this error is happening for you. Setting BSPARAM= in quickCluster() works for me, and it compiles elsewhere. Try it locally instead of on an AMI.

wenbostar commented 5 years ago

This parameter (BSPARAM = IrlbaParam()) is used by which function? I don't find this parameter in function quickCluster.

grep BSPARAM prepare.Rmd
clustF <- quickCluster(sceF, use.ranks=FALSE, BSPARAM=IrlbaParam())
wenbostar commented 5 years ago

The problem has been fixed after I installed the scran from https://github.com/MarioniLab/scran. Looks like the version of scran (1.11.20) from Bioc-devel branch doesn't have parameter "BSPARAM".

LTLA commented 5 years ago

Hm... looks like Bioconductor's source package is somehow out of date with the reference manual on the same page. I bumped the version a few days ago, so this will get fixed with 1.11.21; in the meantime, just use the GitHub version.