LieberInstitute / SPEAQeasy

SPEAQeasy: portable LIBD RNA-seq pipeline using Nextflow. Check http://research.libd.org/SPEAQeasy-example/ for an example on how to use this pipeline and analyze the resulting output files.
http://lieberinstitute.github.io/SPEAQeasy
MIT License
6 stars 4 forks source link

BuildAnnotationObjects fails #99

Closed martinezvbs closed 1 year ago

martinezvbs commented 1 year ago

Hello,

I am having the next error:

Error executing process > 'BuildAnnotationObjects'

Caused by:
  Process `BuildAnnotationObjects` terminated with an error exit status (1)

Command executed:

  Rscript build_annotation_objects.R -r hg38 -s hg38_gencode_v40_main

Command exit status:
  1

Command output:
  (empty)

Command error:

      anyDuplicated, append, as.data.frame, basename, cbind, colnames,
      dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
      grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
      order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
      rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
      union, unique, unsplit, which.max, which.min

  Loading required package: S4Vectors
  Loading required package: stats4

  Attaching package: ‘S4Vectors’

  The following objects are masked from ‘package:base’:

      expand.grid, I, unname

  Loading required package: IRanges
  Loading required package: XVector
  Loading required package: GenomeInfoDb

  Attaching package: ‘Biostrings’

  The following object is masked from ‘package:base’:

      strsplit

  Loading required package: rafalib
  Loading required package: GenomicRanges
  Loading required package: AnnotationDbi
  Loading required package: Biobase
  Welcome to Bioconductor

      Vignettes contain introductory material; view with
      'browseVignettes()'. To cite Bioconductor, see
      'citation("Biobase")', and for packages 'citation("pkgname")'.

  Loading required package: usethis

  Attaching package: ‘devtools’

  The following object is masked from ‘package:rafalib’:

      install_bioc

  Error in download.file(url, destfile, quiet = TRUE) : 
    cannot open URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz'
  Calls: getChromInfoFromUCSC ... <Anonymous> -> fetch_table_from_UCSC -> fetch_table_from_url
  Execution halted

I am using R version 4.3.1, BiocManager 1.30.2, GenomeInfoDB 1.36.2

Any advice?

Thanks!

Nick-Eagles commented 1 year ago

Hello, thanks for reaching out. My first thought would be to rule out issues accessing that URL that might be independent of SPEAQeasy, as I saw DNS settings were causing issues in #98 for you. E.g. can you run curl -O http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz on the machine where you're running SPEAQeasy?

martinezvbs commented 1 year ago

Hi,

Thanks for your comment.

If I run curl -O http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz I do see the chromInfo download.

I tried the same with other url in the main.nf file and they were working as well (both wget and curl). I don't know what could it be the problem

Any advice?

Thanks!

Nick-Eagles commented 1 year ago

Interesting-- what installation method did you choose (e.g. singularity or docker?) and main script (curious if you're running locally or through SLURM or SGE)? And this may sound silly, but have you tried resuming (just to rule out temporary internet or server hiccups-- especially on computing clusters were temporary issues seem more common)?

martinezvbs commented 1 year ago

Hi,

I installed SPEAQeasy with docker (24.0.6, build ed223bc). Right now I am running everything locally to understand how it works and then 'translate it' to servers. I have not tried 'resuming' since I am working on a personal computer. Should I try that? It is like 'rebooting'?

Thanks!

Nick-Eagles commented 1 year ago

I just meant resuming SPEAQeasy, which I think is worthwhile to rule out temporary issues

martinezvbs commented 1 year ago

Hi,

I used the -resume argument but I still having the same issue

executor >  local (7)
[skipped  ] process > PullAssemblyFasta      [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PullGtf                [100%] 1 of 1, stored: 1 ✔
[-        ] process > buildHISATindex        -
[ee/8ea1e7] process > BuildAnnotationObjects [100%] 1 of 1, failed: 1 ✘
[skipped  ] process > PullTranscriptFasta    [100%] 1 of 1, stored: 1 ✔
[-        ] process > BuildKallistoIndex     -
[71/5932ba] process > PreprocessInputs       [100%] 1 of 1 ✔
[3c/301530] process > QualityUntrimmed       [  0%] 0 of 25

This is how it looks like. I was previously working with the Yale VPN but with or without the VPN the problem persists, that's why I was trying to modify the IPv4 but now the problem seems to be the process build_annotation_objects.R

Thanks!

Nick-Eagles commented 1 year ago

Yeah, the problem is this specific line failing. As a quick check, I just interactively was able to run that line with docker and the settings you're using.

Maybe the next thing to check would be if you can download files from the internet within the docker container for that process:

docker run -it libddocker/bioc_kallisto:3.17 bash
curl -O http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz

If that doesn't work, I'm curious if this slight variation does:

docker run -it --net=host libddocker/bioc_kallisto:3.17 bash
curl -O http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz

And if the second works but not the first, I bet changing this line to this and resuming should fix things:

runOptions = '-u $(id -u):$(id -g) --net=host'

martinezvbs commented 1 year ago

Hi,

Both of them were working, I also modified the line 12 on the docker file and after that the pipeline downloaded the PullGtf file but I have the following error:

  Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
    !anyNA(m32) is not TRUE
  Calls: getChromInfoFromUCSC ... .get_chrom_info_for_registered_UCSC_genome -> GET_CHROM_SIZES -> .order_seqlevels -> stopifnot
  Execution halted

I was reading about this error on the previous issues and I already updated the R version on the install.software file as follows:

        #  R (4.3.1) -----------------------------------------------------------

        #  Install R
        curl -O https://cran.r-project.org/src/base/R-4/R-4.3.1.tar.gz
        tar -xzf R-4.3.1.tar.gz
        cd R-4.3.1
        ./configure --prefix=$INSTALL_DIR --with-x=no
        make
        make install
        cd $INSTALL_DIR

        #  Install packages that will be used by the pipeline
        ./R-4.3.1/bin/Rscript ../scripts/check_R_packages.R

        BASE_DIR=$(dirname $INSTALL_DIR)
        cd $BASE_DIR

        #  Create the test samples.manifest files
        Software/R-4.3.1/bin/Rscript scripts/make_test_manifests.R -d $(pwd)
        cd $INSTALL_DIR

But the issue persists

Thanks in advance for you time!

Nick-Eagles commented 1 year ago

That's good that the docker configuration change worked. So now it sounds like you're just dealing with #95, which I thought was solved-- actually, I just tested the previously buggy line (or was it this one?) with docker and it's working for me. Are you using a recent version of SPEAQeasy?

Nick-Eagles commented 1 year ago

Also, keep in mind that since you're running things with docker, SPEAQeasy will run R inside of the docker container libddocker/bioc_kallisto:3.17 (latest SPEAQeasy at least), so don't worry about the R version specified for local installations in install_software.sh (or any version of R on your machine)

martinezvbs commented 1 year ago

Hi,

It's good to know the R part and I am working with the

// Pipeline version
version = "0.8.0"

but I am still having the same issue

  Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
    !anyNA(m32) is not TRUE
  Calls: getChromInfoFromUCSC ... .get_chrom_info_for_registered_UCSC_genome -> GET_CHROM_SIZES -> .order_seqlevels -> stopifnot
  Execution halted

Thanks!

Nick-Eagles commented 1 year ago

Sorry for the confusion-- that version = "0.8.0" line is something we actually haven't been maintaining, and I should probably fix that. The SPEAQeasy version (actually just a commit ID for some point in the version history) gets printed in SPEAQeasy_output.log near the top, and that can help me verify the version you're using is recent.

martinezvbs commented 1 year ago

Hi,

So it must be this version

Picked up _JAVA_OPTIONS: -Xms5g -Xmx7g
N E X T F L O W  ~  version 20.01.0
Launching `/Documents/Computational/SPEAQeasy/main.nf` [compassionate_aryabhata] - revision: 2d61fda5a4
================================================================================
SPEAQeasy: an RNA-seq analysis pipeline from LIBD
================================================================================
---- Main options:
SPEAQeasy version : 5acceb5960599f2967daf1b1dc2b5233245b38f0

Thanks!

Nick-Eagles commented 1 year ago

Thanks! Interesting-- you are using a recent enough version with the most recent docker image. I'll investigate deeper and see if I can reproduce the issue, and let you know.

Nick-Eagles commented 1 year ago

Hi-- I did reproduce the issue and pushed a fix to the docker image (updated the GenomeInfoDb package to a version without the issue). You should be able to pull the updated image (I think just docker pull libddocker/bioc_kallisto:3.17 overwrites the copy you have downloaded) and resume SPEAQeasy.

martinezvbs commented 1 year ago

Hi,

I tried with what you mentioned:

3.17: Pulling from libddocker/bioc_kallisto
2ab09b027e7f: Already exists 
93d2561c401b: Already exists 
95e0e5943dfa: Already exists 
5cddf6295dd8: Already exists 
82c4f42e9555: Already exists 
8a57ada09c39: Already exists 
1196a3d8d140: Already exists 
c71c97a09fcb: Already exists 
bfb8d14fd94f: Already exists 
64e1d5b2fb19: Pull complete 
321aed03811e: Pull complete 
229969c27dc9: Pull complete 
3494ff2e87c3: Pull complete 
Digest: sha256:ab4d546ee36b8830ee2ad98947e360602f586fbc6ab36f67d2ff4fb93f24e072
Status: Downloaded newer image for libddocker/bioc_kallisto:3.17
docker.io/libddocker/bioc_kallisto:3.17

and the issue persists

  Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
    !anyNA(m32) is not TRUE
  Calls: getChromInfoFromUCSC ... .get_chrom_info_for_registered_UCSC_genome -> GET_CHROM_SIZES -> .order_seqlevels -> stopifnot
  Execution halted

Should I try to reinstall everything?

Nick-Eagles commented 1 year ago

Wow, this is embarrassing-- I never updated the image version specified in conf/docker.config to use 3.17. Just pushed that update, which solved the issue on my end. It's probably faster for you to do a find and replace from 3.14 to 3.17 in conf/docker.config, but you could also fully reinstall if you want. Thanks for your patience-- hopefully this solves things for you.

martinezvbs commented 1 year ago

Hi,

It works already!


[skipped  ] process > PullAssemblyFasta      [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PullGtf                [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > buildHISATindex        [100%] 1 of 1, stored: 1 ✔
[db/257b1d] process > BuildAnnotationObjects [100%] 1 of 1 ✔
[skipped  ] process > PullTranscriptFasta    [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > BuildKallistoIndex     [100%] 1 of 1, stored: 1 ✔
[d2/0011da] process > PreprocessInputs       [100%] 1 of 1 ✔
[82/0931b1] process > QualityUntrimmed       [100%] 4 of 4 ✔
[0b/62fec6] process > InferStrandness        [100%] 4 of 4 ✔
[e9/dc6de5] process > CompleteManifest       [100%] 1 of 1 ✔
[a8/65496b] process > Trimming               [100%] 4 of 4 ✔
[-        ] process > QualityTrimmed         -
[02/d8df86] process > SingleEndHISAT         [100%] 4 of 4 ✔
[5c/b6faff] process > BamSort                [100%] 4 of 4 ✔
[4f/11d6c2] process > FeatureCounts          [100%] 4 of 4 ✔
[37/819d92] process > PrimaryAlignments      [100%] 4 of 4 ✔
[52/ff3ebd] process > Junctions              [100%] 4 of 4 ✔
[6e/3fcc68] process > TXQuantKallisto        [100%] 4 of 4 ✔
[de/2e57de] process > CountObjects           [100%] 1 of 1 ✔
[4e/39c3a0] process > VariantCalls           [100%] 4 of 4 ✔
[cb/f34f40] process > VariantsMerge          [100%] 1 of 1 ✔

Completed at: 13-Sep-2023 11:49:53
Duration    : 5m 20s
CPU hours   : 0.4
Succeeded   : 45

I am just curious about the QualityTrimmed but I guess I can do the analysis apart.

Thank you!

Nick-Eagles commented 1 year ago

Great, thanks again for your patience! What I believe you're seeing with QualityTrimmed is a case where none of your samples were trimmed (note the Trimming process, somewhat confusingly, still runs even if trimming doesn't occur). QualityTrimmed runs FastQC only for any samples that were trimmed. Also check out --trim_mode here for more details.

In any case, I'll close this issue.

Best, -Nick