bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
990 stars 354 forks source link

R package version error when combining tx2gene CSV files #3674

Open dnbuckley opened 2 years ago

dnbuckley commented 2 years ago

I receive the following error when running the bcbio RNAseq pipeline. It appears to be a R package version conflict but I'm not sure how to resolve it. Thanks in advance for any advice.

[2022-09-13T09:42Z] a01-14.hpc.usc.edu: tx2gene file /scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/inputs/transcriptome/hg38-tx2gene.csv created from /project/salhia_618/bin/bcbio/bcbio_bin/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf.
[2022-09-13T09:42Z] a01-14.hpc.usc.edu: Combining tx2gene CSV files.
[2022-09-13T09:42Z] a01-14.hpc.usc.edu: Loading tximport.
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: WARNING: ignoring environment value of R_HOME
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
[2022-09-13T09:43Z] a01-14.hpc.usc.edu:  namespace ‘vctrs’ 0.3.8 is being loaded, but >= 0.4.1 is required
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: In addition: Warning message:
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: package ‘tidyverse’ was built under R version 4.1.3
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: Execution halted
[2022-09-13T09:43Z] a01-14.hpc.usc.edu: Uncaught exception occurred
Traceback (most recent call last):
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/bin/Rscript --vanilla -e library(tidyverse);salmon_files = list.files("/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/salmon", pattern="quant.sf", recursive=TRUE, full.names=TRUE);tx2gene = readr::read_csv("/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/inputs/transcriptome/tx2gene.csv", col_names=c("transcript", "gene")); samples = basename(dirname(salmon_files));names(salmon_files) = samples;txi = tximport::tximport(salmon_files, type="salmon", tx2gene=tx2gene, countsFromAbundance="lengthScaledTPM", dropInfReps=TRUE);readr::write_csv(round(txi$counts) %>% as.data.frame() %>% tibble::rownames_to_column("gene"), "/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/bcbiotx/tmpcl0xtk4o/tximport-counts.csv");readr::write_csv(txi$abundance %>% as.data.frame() %>% tibble::rownames_to_column("gene"), "/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/bcbiotx/tmp7r9u8il2/tximport-tpm.csv");
WARNING: ignoring environment value of R_HOME
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 namespace ‘vctrs’ 0.3.8 is being loaded, but >= 0.4.1 is required
In addition: Warning message:
package ‘tidyverse’ was built under R version 4.1.3
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 58, in run_main
    fc_dir, run_info_yaml)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 264, in rnaseqpipeline
    samples = rnaseq.combine_files(samples)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/pipeline/rnaseq.py", line 489, in combine_files
    tximport = load_tximport(data)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/pipeline/rnaseq.py", line 556, in load_tximport
    do.run([rcmd, "--vanilla", "-e", render_string], f"Loading tximport.")
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/project/salhia_618/bin/bcbio/bcbio_bin/anaconda/bin/Rscript --vanilla -e library(tidyverse);salmon_files = list.files("/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/salmon", pattern="quant.sf", recursive=TRUE, full.names=TRUE);tx2gene = readr::read_csv("/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/inputs/transcriptome/tx2gene.csv", col_names=c("transcript", "gene")); samples = basename(dirname(salmon_files));names(salmon_files) = samples;txi = tximport::tximport(salmon_files, type="salmon", tx2gene=tx2gene, countsFromAbundance="lengthScaledTPM", dropInfReps=TRUE);readr::write_csv(round(txi$counts) %>% as.data.frame() %>% tibble::rownames_to_column("gene"), "/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/bcbiotx/tmpcl0xtk4o/tximport-counts.csv");readr::write_csv(txi$abundance %>% as.data.frame() %>% tibble::rownames_to_column("gene"), "/scratch2/dnbuckle/test_bcbio/RNAseq/test_multifile/rundir/work/bcbiotx/tmp7r9u8il2/tximport-tpm.csv");
WARNING: ignoring environment value of R_HOME
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 namespace ‘vctrs’ 0.3.8 is being loaded, but >= 0.4.1 is required
In addition: Warning message:
package ‘tidyverse’ was built under R version 4.1.3
Execution halted
naumenko-sa commented 2 years ago

Hi @dnbuckley ! Could you please provide your bcbio yaml config? SN

dnbuckley commented 2 years ago

Yeah sure, I've removed all but 2 samples because the config was quite long. The rest of the samples had the same pattern.

upload:
  dir: ../final
details:
- files:
  - ../input//SAAAEL.REF_1/SAAAEL.REF_1_R1.fastq.gz
  - ../input//SAAAEL.REF_1/SAAAEL.REF_1_R2.fastq.gz
  metadata:
    batch: BATCH_SAAAEL.REF_1
    category: SAMPLE_SAAAEL.REF_1
  description: SAAAEL.REF_1
  analysis: RNA-seq
  genome_build: hg38
  algorithm:
    aligner: star
    expression_caller:
    - salmon
    - kallisto
    fusion_caller:
    - arriba
    - pizzly
    quality_format: standard
    strandedness: auto
    trim_reads: no
    quantify_genome_alignments: yes
    variantcaller: gatk-haplotype
    tools_off: gatk4
- files:
  - ../input//SAAAEL.REF_10/SAAAEL.REF_10_R1.fastq.gz
  - ../input//SAAAEL.REF_10/SAAAEL.REF_10_R2.fastq.gz
  metadata:
    batch: BATCH_SAAAEL.REF_10
    category: SAMPLE_SAAAEL.REF_10
  description: SAAAEL.REF_10
  analysis: RNA-seq
  genome_build: hg38
  algorithm:
    aligner: star
    expression_caller:
    - salmon
    - kallisto
    fusion_caller:
    - arriba
    - pizzly
    quality_format: standard
    strandedness: auto
    trim_reads: no
    quantify_genome_alignments: yes
    variantcaller: gatk-haplotype
    tools_off: gatk4
resources:
  star:
    cores: 16
    memory: 16G
  machine:
    cores: 64.0
    memory: 245.0
fc_name: BCB_RNAseq
naumenko-sa commented 2 years ago

thanks, it was useful.

The error is triggered when running tximport in R. Could you please try to upgrade tools with bcbio_nextgen.py upgrade -u skip --tools

In my test, the new 1.2.9 installation received:

Name                    Version                   Build  Channel
r-vctrs                   0.4.1             r41h7525677_0    conda-forge

and the test rna-seq run finished ok.

You may also try to update that one package: mamba install r-vctrs=0.4.1 -c conda-forge

naumenko-sa commented 2 years ago

Hi @mjsteinbaugh ! How are you doing?

bcbiornaseq failed in the test run in the fresh install:

subprocess.CalledProcessError: Command '/n/data1/cores/bcbio/naumenko/bcbio_install_test2/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla /n/data1/cores/bcbio/naumenko/_example_bcbio_runs/2_bulk_rnaseq/fast_test_6samples_chr22/seqc/final/bcbioRNASeq/load_bcbioRNAseq.R
Loading required package: basejump
Error: package or namespace load failed for ‘basejump’:
 object ‘URLencode’ is not exported by 'namespace:AcidBase'
Error: package ‘basejump’ could not be loaded
Execution halted
' returned non-zero exit status 1.

load_bcbioRNAseq.R:

library(bcbioRNASeq);
bcb <- bcbioRNASeq(uploadDir="/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/2_bulk_rnaseq/fast_test_6samples_chr22/seqc/final",interestingGroups=c("category"),level="gene",organism="homo sapiens");
flat <- coerceToList(bcb);
saveData(bcb, flat, dir="data")

Any suggestions? Our installation configuration for bcbiornaseq: https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L274

SN

mjsteinbaugh commented 2 years ago

Hi @naumenko-sa I'll take a look tomorrow and update the conda recipe if necessary

mjsteinbaugh commented 2 years ago

@naumenko-sa I think there may be some version flipping going on with the current recipe defined in CloudBioLinux. I'll clean install bcbio this weekend and do some testing. In the meantime, here's the R environment we want that currently works correctly on bioconda:

conda create --name='r-bcbiornaseq@0.4.0' 'r-bcbiornaseq==0.4.0'
conda activate 'r-bcbiornaseq@0.4.0'
R
library(bcbioRNASeq)
utils::sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS/LAPACK: /Users/mike/.conda/envs/r-bcbiornaseq@0.4.0/lib/libopenblasp-r0.3.21.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] bcbioRNASeq_0.4.0           SummarizedExperiment_1.24.0
 [3] Biobase_2.54.0              GenomicRanges_1.46.1
 [5] GenomeInfoDb_1.30.0         IRanges_2.28.0
 [7] S4Vectors_0.32.4            BiocGenerics_0.40.0
 [9] MatrixGenerics_1.6.0        matrixStats_0.62.0

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                bcbioBase_0.7.0
 [3] bit64_4.0.5                 RColorBrewer_1.1-3
 [5] httr_1.4.4                  syntactic_0.5.2
 [7] tools_4.1.3                 utf8_1.2.2
 [9] R6_2.5.1                    AcidGenerics_0.6.0
[11] DBI_1.1.3                   colorspace_2.0-3
[13] tidyselect_1.1.2            processx_3.7.0
[15] DESeq2_1.34.0               bit_4.0.4
[17] compiler_4.1.3              AcidExperiment_0.3.0
[19] AcidSingleCell_0.2.0        cli_3.4.1
[21] DelayedArray_0.20.0         scales_1.2.1
[23] genefilter_1.76.0           stringr_1.4.1
[25] AcidMarkdown_0.1.6          XVector_0.34.0
[27] pkgconfig_2.0.3             sessioninfo_1.2.2
[29] AcidPlyr_0.2.0              fastmap_1.1.0
[31] limma_3.50.3                rlang_1.0.5
[33] AcidPlots_0.4.0             RSQLite_2.2.8
[35] BiocIO_1.4.0                generics_0.1.3
[37] BiocParallel_1.28.3         dplyr_1.0.10
[39] RCurl_1.98-1.8              magrittr_2.0.3
[41] GenomeInfoDbData_1.2.7      patchwork_1.1.2
[43] Matrix_1.4-1                Rcpp_1.0.9
[45] munsell_0.5.0               fansi_1.0.3
[47] lifecycle_1.0.2             stringi_1.7.8
[49] edgeR_3.36.0                zlibbioc_1.40.0
[51] goalie_0.6.0                grid_4.1.3
[53] blob_1.2.3                  parallel_4.1.3
[55] crayon_1.5.1                AcidCLI_0.2.0
[57] lattice_0.20-45             Biostrings_2.62.0
[59] splines_4.1.3               annotate_1.72.0
[61] KEGGREST_1.34.0             locfit_1.5-9.6
[63] pipette_0.8.0               knitr_1.40
[65] ps_1.7.1                    pillar_1.8.1
[67] AcidGenomes_0.3.0           geneplotter_1.72.0
[69] AcidBase_0.5.0              XML_3.99-0.10
[71] glue_1.6.2                  data.table_1.14.2
[73] vctrs_0.4.1                 png_0.1-7
[75] gtable_0.3.1                purrr_0.3.4
[77] assertthat_0.2.1            cachem_1.0.6
[79] ggplot2_3.3.6               xfun_0.33
[81] xtable_1.8-4                survival_3.4-0
[83] SingleCellExperiment_1.16.0 tibble_3.1.8
[85] AnnotationDbi_1.56.1        memoise_2.0.1
[87] tximport_1.22.0
mjsteinbaugh commented 2 years ago

Ah OK I see the CloudBioLinux recipe we have r-bcbiornaseq=0.3.44 pinned. We should be able to upgrade this to 0.4.0 and that will resolve the NAMESPACE issue seen above.

See related pull request https://github.com/chapmanb/cloudbiolinux/pull/403

naumenko-sa commented 2 years ago

Hi @mjsteinbaugh !

Thanks for the update! Are you sure we don't need more pins?

For the new install it may work, but for the successful upgrade I needed to run:

bcbio_nextgen.py upgrade -u skip --tools
mamba install r-acidexperiment=0.3.0 -n rbcbiornaseq
mamba install r-acidplots=0.4.0 -n rbcbiornaseq
mamba install r-bcbiobase=0.7.0 -n rbcbiornaseq

bcbioRNASeq loaded successfully, and ran as well, but in the end:

→ `fpkm()`
✔ bcbio RNA-seq run imported successfully.
Error in coerceToList(bcb) : could not find function "coerceToList"
Execution halted
' returned non-zero exit status 1.

Do we need to change the wrapper script as well?

SN

mjsteinbaugh commented 2 years ago

It's possible we need more pins to get conda to behave correctly. I'll take a look this week with a bcbio test install and will get back to you soon with an update! I may need to post a minor r-bcbiornaseq update to conda that helps tighten this up a bit.

dnbuckley commented 2 years ago

Hello all, thanks for the suggestions. I successfully ran the bcbio upgrade command and mamba changes @mjsteinbaugh suggested and still ran into the same problem unfortunately.

mjsteinbaugh commented 2 years ago

@naumenko-sa Following up on this, I can push a bioconda update that should fix this issue once the R 4.2 / Bioconductor 3.15 release series is available. Looks like this should be completed in the next week:

https://github.com/conda-forge/r-base-feedstock/pull/216 https://github.com/bioconda/bioconda-recipes/issues/35116 https://conda-forge.org/status/#r-base42

dnbuckley commented 1 year ago

Hi all, so I tried a full reinstall of bcbio and this error persists. One thing I have found is that if i re-queue the job with the same config after the crash it will complete, and appears to generate the combined.counts file.

mjsteinbaugh commented 1 year ago

@naumenko-sa Bioconductor 3.16 is being pushed to bioconda this week, and I'll work on updating the r-bcbiornaseq recipe later in the week.

Status can be tracked here: https://anaconda.org/bioconda/repo

mjsteinbaugh commented 1 year ago

@naumenko-sa r-bcbiornaseq has been updated to 0.5.1 on bioconda