aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
394 stars 94 forks source link

R crashes on SCENIC Initialization #238

Closed NickNolan closed 2 years ago

NickNolan commented 2 years ago

I am trying to use SCENIC; however, R crashes as soon as I try and run initializeScenic().

sessionInfo() is as follows:

sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SCENIC_1.2.4                dplyr_1.0.8                 monocle3_1.0.0              SingleCellExperiment_1.16.0
 [5] SummarizedExperiment_1.24.0 GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0             
 [9] S4Vectors_0.32.3            MatrixGenerics_1.6.0        matrixStats_0.61.0          Biobase_2.54.0             
[13] BiocGenerics_0.40.0         Matrix_1.4-0               

loaded via a namespace (and not attached):
 [1] viridis_0.6.2          httr_1.4.2             bit64_4.0.5            viridisLite_0.4.0      R.utils_2.11.0        
 [6] shiny_1.7.1            blob_1.2.2             GenomeInfoDbData_1.2.7 AUCell_1.16.0          pillar_1.7.0          
[11] RSQLite_2.2.12         lattice_0.20-45        glue_1.6.2             digest_0.6.29          promises_1.2.0.1      
[16] XVector_0.34.0         colorspace_2.0-3       R.oo_1.24.0            htmltools_0.5.2        httpuv_1.6.5          
[21] plyr_1.8.6             GSEABase_1.56.0        XML_3.99-0.9           pkgconfig_2.0.3        zlibbioc_1.40.0       
[26] purrr_0.3.4            xtable_1.8-4           scales_1.1.1           later_1.3.0            tibble_3.1.6          
[31] annotate_1.72.0        KEGGREST_1.34.0        generics_0.1.2         ggplot2_3.3.5          ellipsis_0.3.2        
[36] cachem_1.0.6           cli_3.2.0              magrittr_2.0.2         crayon_1.5.1           mime_0.12             
[41] memoise_2.0.1          R.methodsS3_1.8.1      fansi_1.0.2            graph_1.72.0           tools_4.1.3           
[46] data.table_1.14.2      lifecycle_1.0.1        stringr_1.4.0          munsell_0.5.0          DelayedArray_0.20.0   
[51] AnnotationDbi_1.56.2   Biostrings_2.62.0      compiler_4.1.3         rlang_1.0.2            grid_4.1.3            
[56] RCurl_1.98-1.6         rstudioapi_0.13        bitops_1.0-7           gtable_0.3.0           DBI_1.1.2             
[61] reshape2_1.4.4         R6_2.5.1               gridExtra_2.3          fastmap_1.1.0          bit_4.0.4             
[66] utf8_1.2.2             stringi_1.7.6          Rcpp_1.0.8.2           vctrs_0.3.8            png_0.1-7             
[71] tidyselect_1.1.2

I have a 16 GB RAM windows machine.

Code I ran is as follows:

library(Matrix)
library(monocle3)
library(dplyr)
library(SCENIC)

org = 'hgnc'
dbs = defaultDbNames[[org]]
scenicOptions <- initializeScenic(org=org, dbDir="databases", dbs=dbs)

Where databases is a folder in the working directory that contains the required .feather files from the cisTarget database. I get the following error message on attempting to run initializeScenic():

Motif databases selected:
  hg19-500bp-upstream-7species.mc9nr.feather
  hg19-tss-centered-10kb-7species.mc9nr.feather
[1] "invalid first argument"
[1] "invalid first argument"
Using the column 'NA' as feature index for the ranking database.

After which the R session aborts altogether after encountering a fatal error. I have tried specifying the number of cores and not including dbs in the initialization; neither of these appears to have fixed the issue.

I am aware that this is similar to Issue #93; however, I don't think that the Issue has really been resolved in full -- it was only marked that way, with no explanation as to how to overcome the problem.

s-aibar commented 2 years ago

Dear @NickNolan ,

Sorry, I was not aware that we had received new posts on that issue. I have now re-opened it.

Most of the issues when loading SCENIC are related to loading the databases: a) Due to incomplete downloads b) Due to inconsistent package versions (we had to change the interface to load the .feather files from package feather to arrow).

Could you confirm whether you can properly load the databases with these commands?

dbPath <-  "~/Downloads/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather" # Choose the appropriate database/location
SCENIC::dbLoadingAttempt(dbPath)

rnk <- RcisTarget::importRankings(dbPath, indexCol="features", columns=c("Sox10","Dlx1")) # you can load only a few genes to do tests faster
rnk

If the crash is indeed loading the databases:

  1. Please, make sure the mdsums of the databases match the ones from the server: https://resources.aertslab.org/cistarget/databases/sha256sum.txt
  2. Are you using the latest version of RcisTarget? (& what about feather and arrow?) i.e.
    packageVersion("RcisTarget")
    packageVersion("arrow")
    packageVersion("feather")
NickNolan commented 2 years ago

Hey @s-aibar,

No worries -- thank you for such a quick response. It looks like I'm not far off of at least one of these issues; as follows:

> dbPath <-  "databases/hg19-tss-centered-10kb-7species.mc9nr.feather"
> SCENIC::dbLoadingAttempt(dbPath)
[1] "invalid first argument"
[1] FALSE
> rnk <- RcisTarget::importRankings(dbPath, indexCol="features", columns=c("Sox10","Dlx1"))
Error in .getIndexCol(allColumns, indexCol = indexCol, verbose = warnMissingColumns) : 
  The index column 'features' is not available in the file.

> dbPath <- "databases/hg19-500bp-upstream-7species.mc9nr.feather"
> SCENIC::dbLoadingAttempt(dbPath)
[1] "invalid first argument"
[1] FALSE
> rnk <- RcisTarget::importRankings(dbPath, indexCol="features", columns=c("Sox10","Dlx1"))
Error in .getIndexCol(allColumns, indexCol = indexCol, verbose = warnMissingColumns) : 
  The index column 'features' is not available in the file.

Checking the sha256 sums:

> library(digest)
> dbPath <-  "databases/hg19-tss-centered-10kb-7species.mc9nr.feather"
> digest(dbPath, 'sha256', file=TRUE)
[1] "ecdac9c5e70b9faa61a0fb7914a40942912327bd54ebab716578be2b1d4f4d1c"
> dbPath <- "databases/hg19-500bp-upstream-7species.mc9nr.feather"
> digest(dbPath, 'sha256', file=TRUE)
[1] "6688688cea5bc04540214d6161ac5ea9ec6e957c1f9689f5dc636666ab241bf7"

Which does not appear to match the sha256sums you listed -- so at a glance, it would seem to be a database issue. For reference, the two database .feather files were downloaded as follows:

> download.file('https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather', destfile='databases/hg19-500bp-upstream-7species.mc9nr.feather')
trying URL 'https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather'
Content type '€’àû' length 1092309888 bytes (1041.7 MB)
downloaded 1041.7 MB

> download.file('https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather', destfile='databases/hg19-tss-centered-10kb-7species.mc9nr.feather')
trying URL 'https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather'
Content type '€’àû' length 1092309888 bytes (1041.7 MB)
downloaded 1041.7 MB

Lastly, package versions are below:

> packageVersion("RcisTarget")
[1] ‘1.14.0’
> packageVersion("arrow")
[1] ‘7.0.0’
> packageVersion("feather")
[1] ‘0.3.5’

Thank you again for your help! Please let me know if there's any other information I can give you to help.

s-aibar commented 2 years ago

Ok, then it seems that you first need to manage to download the databases successfully (the download.file command in the vignette is convenient, but apparently not very reliable... :() Here you have some alternatives: https://resources.aertslab.org/cistarget/help.html

Once you manage to get the same sums, the rest will hopefully work (you seem to have the correct versions). Please, let us know whether that is the case :)

NickNolan commented 2 years ago

This appears to have solved my issues -- thank you immensely for your help! (good to know that I can't trust download.file anymore... though I can't say I really understand why I can't trust it)