aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
412 stars 94 forks source link

why is this software so crappy? can't even run the first line? #80

Closed tanpuekai closed 5 years ago

tanpuekai commented 5 years ago
  1. I install according to this link: https://rawcdn.githack.com/aertslab/SCENIC/a0a00644b2f3589a3e2bc65486fc5f6cc00f48e1/inst/doc/SCENIC_Setup.html#sample-dataset-download-format and run according to this link: https://rawcdn.githack.com/aertslab/SCENIC/a0a00644b2f3589a3e2bc65486fc5f6cc00f48e1/inst/doc/SCENIC_Running.html#exploringinterpreting_the_results

  2. I have several PCs with 32G RAM each. My PCs are good enough. Also got a cluster account. R3.6 is used.

  3. on the first link, it says it needs three R packages: "GENIE3", "AUCell", "RcisTarget", but as I run thru the steps, it ended up installing over 100 packages.

  4. After many version conflicts, like Rngtools, some just need older version to get by; I did manage to load "SCENIC" into R.

  5. It says it needs those "databases", so I downloaded them all. I found no matter how I download them (using PCs, and clusters), the Sha256 code is always different from what you provided. For example, these are the SHA256 on my side:

    6688688cea5bc04540214d6161ac5ea9ec6e957c1f9689f5dc636666ab241bf7  hg19-500bp-upstream-7species.mc9nr.feather
    ecdac9c5e70b9faa61a0fb7914a40942912327bd54ebab716578be2b1d4f4d1c  hg19-tss-centered-10kb-7species.mc9nr.feather

    and these are what you said they are in your web:

    12576cfc5f19354610831a558ce4ec42780735b7d840e0a06259c080880bcc6e  hg19-500bp-upstream-7species.mc9nr.feather
    20135a199f8883a456e5d0a4de66a3fc0ff33b2d8bd0f7b92dd80a8eaef9fee1  hg19-tss-centered-10kb-7species.mc9nr.feather
  6. OK, let us getting into running the first line of SCENIC:

loomPath <- system.file(package="SCENIC", "examples/mouseBrain_toy.loom")

what I found is that loomPath is an empty variable. There is nothing in it after running above line. And of coz, because of that the following lines are all errors:

> loomPath <- system.file(package="SCENIC", "examples/mouseBrain_toy.loom")
> library(SCopeLoomR)

Attaching package: ‘SCopeLoomR’

The following object is masked from ‘package:base’:

    flush

> loom <- open_loom(loomPath, mode="r")
Error in H5File.open(filename, mode, file_create_pl, file_access_pl) : 
  HDF5-API Errors:
    error #000: C:\pkg\hdf5-1.8.14\src\H5F.c in H5Fopen(): line 591: invalid file name
        class: HDF5
        major: Invalid arguments to routine
        minor: Bad value
> exprMat <- get_dgem(loom)
Error in t(loom[["matrix"]][, ]) : object 'loom' not found
> cellInfo <- get_cellAnnotation(loom)
Error in get_cellAnnotation(loom) : object 'loom' not found
> close_loom(loom)
  1. Since I know that this software is apparently not well engineered, and keeps distracting the end users for useless sidetracks; For example, here apparently what you want is a gene-by-cell matrix called exprMat, I went to your website to download that "mouseBrain.RData", and generated a exprMat to keep it going.

  2. Then your web says:

    library(SCENIC)
    org="mgi" # or hgnc, or dmel
    dbDir="cisTarget_databases" # RcisTarget databases location
    myDatasetTitle="SCENIC example on Mouse brain" # choose a name for your analysis
    data(defaultDbNames)
    dbs <- defaultDbNames[[org]]
    scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10) 

    Can I ask why you use = and <- interchangeably? that just makes your software look so crappy.

BTW, what is that dbs=dbs for? i have installed the latest version of your package, but it does not accept this argument:

scenicOptions <- initializeScenic(org=org, dbDir=dbDir,dbs=DBS,
+  datasetTitle=myDatasetTitle, nCores=1) 
Error in initializeScenic(org = org, dbDir = dbDir, dbs = DBS, datasetTitle = myDatasetTitle,  : 
  unused argument (dbs = DBS)

I mean you listed all kinds of boundaries (from 500bp, to 5k, to 10k) here, and you have 7 species, 10 species and all kinds of weird but unexplained versions (like mc8/9nr, what the hell is that?), but it seems you only accept the default?

  1. at some point, you need to run correlation, and the way you do it is:
    corrMat <- cor(t(exprMat_filtered), method="spearman")

what the hell ... you developed a software that needs to load >100 packages full of conflicts just for this equation? And this is the foundation for all your regulatory network construction?

in another document, you said use runCorrelation(exprMat_filtered, scenicOptions) to compute pairwise correlations, the thing is the function runCorrelation() does not exist at all:

>   library(SCENIC)
> runCorrelation
Error: object 'runCorrelation' not found
> sessionInfo()
other attached packages:
 [1] GENIE3_1.4.3                SCopeLoomR_0.4.0           
 [3] SingleCellExperiment_1.6.0  SummarizedExperiment_1.14.0
 [5] DelayedArray_0.10.0         BiocParallel_1.17.18       
 [7] matrixStats_0.54.0          Biobase_2.44.0             
 [9] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0        
[11] IRanges_2.18.0              S4Vectors_0.22.0           
[13] BiocGenerics_0.30.0         RcisTarget_1.2.1           
[15] AUCell_1.4.1                SCENIC_1.1.0-01 
  1. In your tutorial page, which is where it reaches after I clicked on a link in the main page of this github project, you said:

    Running SCENIC
    SCENIC (Single Cell rEgulatory Network Inference and Clustering)
    Vignette built on Feb 07, 2019 with SCENIC version 1.1.1.9.

    Seems pretty recently updated, but wait a min, where the hell is the version 1.1.1.9?

    >devtools::install_github("aertslab/SCENIC", ref="v1.1.1.9")
    Error: HTTP error 422.
    No commit found for SHA: v1.1.1.9
    
    Rate limit remaining: 59/60
    Rate limit reset at: 2019-05-23 07:44:05 UTC
  2. OK, so since cor(t(exprMat_filtered), method="spearman") is a built-in command, and is guaranteed to work, so however crappy the formula might seem, I go ahead with this step, and continue into the next steps runGenie3(exprMat_L, scenicOptions). Thank goodness, it works for this step.

So I then proceeded into runSCENIC_1/2/3. Luckily, runSCENIC_1_coexNetwork2modules(scenicOptions) seems to be working, but then I encountered serious setbacks at runSCENIC_2:

>runSCENIC_2_createRegulons(scenicOptions)
15:09   Step 2. Identifying regulons
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Execution halted
  1. OK, might be the feather package is too old, so i re-install the latest version. But then came this:

    
    15:54   Step 2. Identifying regulons
    tfModulesSummary:
    
    top5perTarget top10perTarget          top50           w001           w005
            21             32             41             43             43
    15:54   RcisTarget: Calculating AUC
    Scoring database:  [Source file: hg19-500bp-upstream-7species.mc9nr.feather]
    Scoring database:  [Source file: hg19-tss-centered-10kb-7species.mc9nr.feather]
    15:59   RcisTarget: Adding motif annotation
    Number of motifs in the initial enrichment: 130110
    Number of motifs annotated to the corresponding TF: 10
    15:59   RcisTarget: Prunning targets
    Number of motifs that support the regulons: 10
    Error in regulons[[tf]] : subscript out of bounds
    Calls: runSCENIC_2_createRegulons -> sapply -> lapply -> FUN -> sort -> unique
    Execution halted


13. obviously other lines do not get into running well. Full of all kinds of bugs and errors, and there is no way one (even with 10 yrs of R experiences) can successfully run thru your software on a very small dataset.

14. This is certainly the worst engineered software we have seen in recent years in the BI community. A good BI package should:
- require minimum dependencies, so as to reduce conflicts
- be as straightforward and simple to use as possible
- well explain the input format and output
- maybe have a simple example that anyone will succeed in running thru it
- for your tutorial page, you should really state where it will take long time to compute, for whatever input size, and let users have an idea of how long to wait.
- pls streamline your tutorial page.
hummuscience commented 5 years ago

My guess is that you attitude and tone won't get you any answer from the developers. Your whole post could be re-written to be more respectful and less demanding. This is, after all, free and open-source. The whole point of putting it up here is for you (with 10 years of R experience) to help out and improve it (or make it more robust), not to bash it into oblivion.

By the way, I ran the whole pipeline with a few issues that were resolved easily within two days. While running the dataset, did you have a look at how much RAM your system was using? I noticed that my swap file was getting full. All the issue I had were simply, too little RAM. Using 50Gb of RAM solved all my issues. Why don't you try that?

tanpuekai commented 5 years ago

My guess is that you attitude and tone won't get you any answer from the developers. Your whole post could be re-written to be more respectful and less demanding. This is, after all, free and open-source. The whole point of putting it up here is for you (with 10 years of R experience) to help out and improve it (or make it more robust), not to bash it into oblivion.

By the way, I ran the whole pipeline with a few issues that were resolved easily within two days. While running the dataset, did you have a look at how much RAM your system was using? I noticed that my swap file was getting full. All the issue I had were simply, too little RAM. Using 50Gb of RAM solved all my issues. Why don't you try that?

i already listed over 10 reasons why it is a horrible BI software, with solid support. It should not have been accepted in the first place. there are many good BI software, most of which stick to the basic principles of good software engineering, like ease of use (maybe one-line cmd), well-defined input and well-explained output, and minimal dependencies.

it is not true that just because it is free and open-sourced, therefore it needs not be sticking to basic principles of good software engineering.

s-aibar commented 5 years ago

Dear @tanpuekai ,

I understand your frustration with the versions and dependencies... as developers of this package we have also had to suffer them. That's why we have also created Docker and Singularity images. They reduce dependencies issues to the minimum, so you might want to check them:

If you prefer to stick to R: It seems that most of your trouble was caused by following a tutorial that does not match your SCENIC version. To see the tutorial matching your local version you can use any of these commands (if the vignettes are installed locally): vignette("SCENIC_Running") or vignetteFile <- file.path(system.file('doc', package='SCENIC'), "SCENIC_Running.Rmd"); file.copy(vignetteFile, "SCENIC_myRun.Rmd")

Also, be aware that the tutorial in vignette/SCENIC_Running.Rmd mostly uses the "wrapper functions" for running the pipeline "as is". If you prefer to run the workflow "manually" (command by command) you can always take the detailed vignettes as template (file name starting by detailedStep_...), and modify according to your needs... It requires some time and understanding of the workflow... but it is more flexible, and you can decide which dependencies to install...

davisidarta commented 5 years ago

@tanpuekai

Despite your attitude, I would like to share that SCENIC is also avaiable as a one-line command via Nextflow. Check here.

I also recommend changing your mindset about 'bioinformatics packages' having the ease of use as a top priority. Of couse the user interface is important, but we'll have to face that nowadays Systems Biology tends to lead us towards increasingly more complex algorithms (and, as such, with more dependencies and more prone to little glitches from the user).

Packages will eventually return errors even on vignettes, and whining about it won't make it different. What can make it different is writing a good, non-aggressive issue, addressing the problem you're facing and further documenting it. Sometimes it may be your own mistake at game (for example -> dbs = DBS, it obviously won't recognize it as the variable since R is case-sensitive).