bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
990 stars 354 forks source link

Error running bcbioRNASeq from within bcbio: there is no package called ‘bcbioRNASeq’ #3565

Open amizeranschi opened 2 years ago

amizeranschi commented 2 years ago

Hello!

I'm trying to run a bulk RNA-seq analysis using the following template:

# Template for human RNA-seq using Illumina prepared samples
---
details:
  - analysis: RNA-seq
    genome_build: sacCer3
    algorithm:
## for hg38, change the aligner to hisat2
      aligner: hisat2
      tools_on: bcbiornaseq
      bcbiornaseq:
        organism: saccharomyces cerevisiae
        interesting_groups: panel
upload:
  dir: ../final

However, this ends with the following error:

[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/tpm/tximport-tpm.csv
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/counts/tximport-counts.csv
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/tx2gene.csv
[2021-11-26T07:15Z] Storing directory in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/transcriptome
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] Timing: bcbioRNAseq loading
[2021-11-26T07:15Z] multiprocessing: run_bcbiornaseqload
[2021-11-26T07:15Z] Loading bcbioRNASeq object.
[2021-11-26T07:15Z] Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
[2021-11-26T07:15Z] Execution halted
[2021-11-26T07:15Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 290, in rnaseqpipeline
    run_parallel("run_bcbiornaseqload", [sample])
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 92, in run_bcbiornaseqload
    return bcbiornaseq.make_bcbiornaseq_object(*args)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/rnaseq/bcbiornaseq.py", line 33, in make_bcbiornaseq_object
    do.run([rcmd, "--vanilla", r_file], "Loading bcbioRNASeq object.")
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
Execution halted
' returned non-zero exit status 1.

This is strange to see, because the package does seem to be installed in the rbcbiornaseq environment:

$ bcbio_conda list -n rbcbiornaseq r-bcbiornaseq
# packages in environment at /home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq:
#
# Name                    Version                   Build  Channel
r-bcbiornaseq             0.3.42            r41hdfd78af_0    bioconda
amizeranschi commented 2 years ago

Thanks for the reply @mjsteinbaugh

I'm attaching the file you requested: tx2gene.csv: tx2gene.csv

I'm also attaching a file with the commands I used to set up bcbio, to download the data and to set up the bcbio runs.

The relevant lines for this analysis are 115-166 (downloading the data) and 206-235 (setting up and running the analysis).

Hope this helps.

VM-setup.txt

mjsteinbaugh commented 2 years ago

OK great thanks, I'll work on a fix for this over the weekend and will be in touch soon with an update.

mjsteinbaugh commented 2 years ago

OK this tx2gene issue with the sacCer3 genome should be fixed by the pending update to r-acidgenomes 0.2.20. I'm working on pushing this to bioconda today.

See relevant code change here: https://github.com/acidgenomics/r-acidgenomes/blob/main/R/AllClasses.R#L864

You can check this with your install here:

packageVersion("AcidGenomes")
## 0.2.20
library(AcidGenomes)
tx2gene <- importTx2Gene(
    file = pasteURL(
        "github.com",
        "bcbio",
        "bcbio-nextgen",
        "files",
        "7739401",
        "tx2gene.csv",
        protocol = "https"
    )
)
print(tx2gene)
## Tx2Gene with 7036 rows and 2 columns
##                txId      geneId
##         <character> <character>
## 1       ETS1-1_rRNA      ETS1-1
## 2       ETS1-2_rRNA      ETS1-2
## 3       ETS2-1_rRNA      ETS2-1
## 4       ETS2-2_rRNA      ETS2-2
## 5        HRA1_ncRNA        HRA1
## ...             ...         ...
## 7032   YPR202W_mRNA     YPR202W
## 7033   YPR203W_mRNA     YPR203W
## 7034 YPR204C-A_mRNA   YPR204C-A
## 7035   YPR204W_mRNA     YPR204W
## 7036     ZOD1_ncRNA        ZOD1
naumenko-sa commented 2 years ago

thanks @mjsteinbaugh! I've pinned it in cloudbiolinux: https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L280

@amizeranschi please let us know if it works for you.

amizeranschi commented 2 years ago

Hello,

Thanks for looking into this. I upgraded bcbio and tools to latest development and launched R from the directory ${bcbio_dir}/anaconda/envs/rbcbiornaseq/bin and the commands mentioned by @mjsteinbaugh above ran successfuly. AcidGenomes v. 0.2.20 seems to be available.

However, the bcbio analysis still ended up crashing, this time due to the version of a different package:

[2022-01-09T13:21Z] multiprocessing: run_bcbiornaseqload
[2022-01-09T13:21Z] Loading bcbioRNASeq object.
[2022-01-09T13:21Z] Loading required package: basejump
[2022-01-09T13:21Z] Error: package or namespace load failed for ‘basejump’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
[2022-01-09T13:21Z]  namespace ‘AcidSingleCell’ 0.1.8 is being loaded, but >= 0.1.9 is required
[2022-01-09T13:21Z] Error: package ‘basejump’ could not be loaded
[2022-01-09T13:21Z] Execution halted
[2022-01-09T13:21Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Loading required package: basejump
Error: package or namespace load failed for ‘basejump’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 namespace ‘AcidSingleCell’ 0.1.8 is being loaded, but >= 0.1.9 is required
Error: package ‘basejump’ could not be loaded
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 290, in rnaseqpipeline
    run_parallel("run_bcbiornaseqload", [sample])
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 92, in run_bcbiornaseqload
    return bcbiornaseq.make_bcbiornaseq_object(*args)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/rnaseq/bcbiornaseq.py", line 31, in make_bcbiornaseq_object
    do.run([rcmd, "--vanilla", r_file], "Loading bcbioRNASeq object.")
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Loading required package: basejump
Error: package or namespace load failed for ‘basejump’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 namespace ‘AcidSingleCell’ 0.1.8 is being loaded, but >= 0.1.9 is required
Error: package ‘basejump’ could not be loaded
Execution halted
' returned non-zero exit status 1.
mjsteinbaugh commented 2 years ago

You're seeing this error because the conda environment solver isn't working correctly. We should be installing these versions:

r-acidgenomes                 0.2.20   r41hdfd78af_0  bioconda
r-acidexperiment               0.2.2   r41hdfd78af_0  bioconda
r-acidsinglecell               0.1.9   r41hdfd78af_0  bioconda
r-basejump                   0.14.23   r41hdfd78af_0  bioconda
r-bcbiobase                   0.6.22   r41hdfd78af_0  bioconda
r-bcbiornaseq                 0.3.44   r41hdfd78af_0  bioconda
amizeranschi commented 2 years ago

@naumenko-sa

Could you pin these package versions as well in cloudbiolinux?

naumenko-sa commented 2 years ago

I've added https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L281, please try again!

amizeranschi commented 2 years ago

Thanks, but that doesn't seem to be ebough to get everything installed as it should. You might have to pin all the 6 package versions mentioned above.

Error: package or namespace load failed for ‘bcbioRNASeq’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 namespace ‘bcbioBase’ 0.6.21 is being loaded, but >= 0.6.22 is required
Execution halted
' returned non-zero exit status 1.
naumenko-sa commented 2 years ago

please try again, I hope we got all of them now

amizeranschi commented 2 years ago

Thanks a lot. We're definitely making progress.

This time, bcbiornaseq complains about the ref-transcripts.gtf file for sacCer3:

🧪 ### featureCounts
→ Importing aligned counts from featureCounts.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2022-01-11_rna-seq-analysis/featureCounts/combined.counts' using data.table::`fread()`.
🧪 ## Feature metadata
bcbio GTF file:
/home/user/bcbio-nextgen/genomes/Scerevisiae/sacCer3/rnaseq/ref-transcripts.gtf
→ Making <GRanges> from GFF file ('ref-transcripts.gtf').
→ Getting GFF metadata for 'ref-transcripts.gtf'.
Error: Failed to detect provider (e.g. "Ensembl") from 'ref-transcripts.gtf'.
Backtrace:
    █
 1. └─bcbioRNASeq::bcbioRNASeq(...)
 2.   └─AcidGenomes::makeGRangesFromGFF(...)
 3.     └─AcidGenomes:::.makeGRangesFromRtracklayer(...)
 4.       └─AcidGenomes::getGFFMetadata(file)
 5.         └─AcidCLI::abort(...)
 6.           └─cli::cli_abort(x)
Execution halted
' returned non-zero exit status 1.

I have checked now and that GTF file doesn't have any header. It was installed as part of the sacCer3 genome by bcbio.

mjsteinbaugh commented 2 years ago

Thanks @amizeranschi can you post that GTF file so I can take a look and work on a fix?

amizeranschi commented 2 years ago

Sure thing, here you go. I changed the extension to .txt so that GitHub would accept it.

ref-transcripts.gtf.txt

mjsteinbaugh commented 2 years ago

OK this appears to be fixed in the development version of r-acidgenomes, which is not yet suitable for deployment on bioconda just yet. I'll post an update when I finish rolling out a stable release supporting this fix.

https://github.com/acidgenomics/r-acidgenomes/tree/develop

amizeranschi commented 2 years ago

@mjsteinbaugh

Would you consider adding support in bcbiornaseq for differential affinity in ChIP-seq and ATAC-seq peaks? Given that bcbio produces consensus peaks and computes read counts, these could be used in DESeq2 exactly like in the RNA-seq scenario.

https://bcbio-nextgen.readthedocs.io/en/latest/contents/atac.html#differential-affinity-analysis

mjsteinbaugh commented 2 years ago

@amizeranschi OK I think this should be fixed on bioconda.

Melisa-Magallanes commented 2 years ago

Hello, I'm getting a similar error trying to install trinity by conda: ERROR conda.core.link:_execute(730): An error occurred while installing package 'bioconda::bioconductor-go.db-3.14.0-r41hdfd78af_0'. I've tried with many conda versions but the error persist: What can I do to fix it?

mjsteinbaugh commented 2 years ago

Hi @Melisa-Magallanes thanks for the update -- I'll try clean installing bcbio and see if I can reproduce

amizeranschi commented 2 years ago

@Melisa-Magallanes

Just in case your error is similar to what I've been seeing (post-link script failed for package bioconda::bioconductor-go.db-3.14.0-r41hdfd78af_0), then know that this is a relatively common problem now and it's being addressed.

Have a look here: https://github.com/bioconda/bioconda-recipes/issues/36499#issuecomment-1217214789

mjsteinbaugh commented 2 years ago

Thanks for the update! I'll see if we can come up with a fix in bioconda-recipes.

amizeranschi commented 2 years ago

Great, thanks a lot. Please have a look at bioconductor-org.hs.eg.db as well. I've been getting a similar error with it while attempting to install bcbio.

Edit: I've submitted a couple of pull requests. https://github.com/bioconda/bioconda-recipes/pull/36554 https://github.com/bioconda/bioconda-recipes/pull/36555

mjsteinbaugh commented 1 year ago

@amizeranschi r-bcbiornaseq has been updated to 0.5.1 on bioconda. I'm working on updating this in the main bcbio-nextgen install with @naumenko-sa

naumenko-sa commented 1 year ago

Thanks @mjsteinbaugh !