bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
990 stars 354 forks source link

Error running bcbioRNASeq from within bcbio: there is no package called ‘bcbioRNASeq’ #3565

Open amizeranschi opened 2 years ago

amizeranschi commented 2 years ago

Hello!

I'm trying to run a bulk RNA-seq analysis using the following template:

# Template for human RNA-seq using Illumina prepared samples
---
details:
  - analysis: RNA-seq
    genome_build: sacCer3
    algorithm:
## for hg38, change the aligner to hisat2
      aligner: hisat2
      tools_on: bcbiornaseq
      bcbiornaseq:
        organism: saccharomyces cerevisiae
        interesting_groups: panel
upload:
  dir: ../final

However, this ends with the following error:

[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/tpm/tximport-tpm.csv
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/counts/tximport-counts.csv
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/tx2gene.csv
[2021-11-26T07:15Z] Storing directory in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/transcriptome
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] multiprocessing: upload_samples_project
[2021-11-26T07:15Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-11-26_rna-seq-analysis/bcbio-nextgen.log
[2021-11-26T07:15Z] Timing: bcbioRNAseq loading
[2021-11-26T07:15Z] multiprocessing: run_bcbiornaseqload
[2021-11-26T07:15Z] Loading bcbioRNASeq object.
[2021-11-26T07:15Z] Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
[2021-11-26T07:15Z] Execution halted
[2021-11-26T07:15Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 290, in rnaseqpipeline
    run_parallel("run_bcbiornaseqload", [sample])
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 92, in run_bcbiornaseqload
    return bcbiornaseq.make_bcbiornaseq_object(*args)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/rnaseq/bcbiornaseq.py", line 33, in make_bcbiornaseq_object
    do.run([rcmd, "--vanilla", r_file], "Loading bcbioRNASeq object.")
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Error in library(bcbioRNASeq) : there is no package called ‘bcbioRNASeq’
Execution halted
' returned non-zero exit status 1.

This is strange to see, because the package does seem to be installed in the rbcbiornaseq environment:

$ bcbio_conda list -n rbcbiornaseq r-bcbiornaseq
# packages in environment at /home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq:
#
# Name                    Version                   Build  Channel
r-bcbiornaseq             0.3.42            r41hdfd78af_0    bioconda
naumenko-sa commented 2 years ago

Hi @amizeranschi !

Thanks for testing and reporting! I've fixed the paths to Rscript: https://github.com/bcbio/bcbio-nextgen/pull/3567

I am getting the below error now:

> library(bcbioRNASeq)
Loading required package: basejump
Error: package or namespace load failed for ‘basejump’:
 object ‘metadataBlacklist’ is not exported by 'namespace:AcidBase'
Error: package ‘basejump’ could not be loaded
> library(basejump)
Error: package or namespace load failed for ‘basejump’:
 object ‘metadataBlacklist’ is not exported by 'namespace:AcidBase'

@mjsteinbaugh could you please help us with this error?

Sergey

mjsteinbaugh commented 2 years ago

Hi Sergey yeah I'll take a look tonight. If you can post the R session info via sessionInfo() that will be helpful. I think it's a quick fix.

mjsteinbaugh commented 2 years ago

Basically I just need to know which version of bcbioRNASeq, basejump, and AcidBase.

For reference, relevant Python code is here: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/bcbiornaseq.py

mjsteinbaugh commented 2 years ago

@amizeranschi What version of bcbio-nextgen are you running? Is this the latest development build?

mjsteinbaugh commented 2 years ago

Yeah I think something weird may be going on with the conda environment in that bcbio install. Here's a clean install of the r-bcbiornaseq v0.3.42 recipe:

conda activate r-bcbiornaseq@0.3.42
R
R.version.string
# [1] "R version 4.1.1 (2021-08-10)"
packageVersion("bcbioRNASeq")
# [1] ‘0.3.42’
packageVersion("basejump")
# [1] ‘0.14.22’
packageVersion("AcidBase")
# [1] ‘0.4.5’
suppressPackageStartupMessages({
    library(bcbioRNASeq)
})
# Loads clean
mjsteinbaugh commented 2 years ago

Ah OK the conda recipe issue appears to be in CloudBioLinux here: https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L272

Adjusting the r-basejump version to latest stable (v0.14.22) instead of v0.14.19 (or removing the r-basejump line in the YAML file) should fix the issue. I need to push a minor bcbioRNASeq update that tightens up the minimum dependency versions a bit -- sorry about that!

bcbioRNASeq R package dependencies are defined in DESCRIPTION file here, for reference: https://github.com/hbc/bcbioRNASeq/blob/master/DESCRIPTION

naumenko-sa commented 2 years ago

thanks @mjsteinbaugh !

mamba install r-basejump=0.14.22 -n rbcbiornaseq in the existing installation helped me to get going. Now I have the following error

Error in flatFiles(bcb) : could not find function "flatFiles"
Execution halted
' returned non-zero exit status 1.

My versions:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
binutils_linux-64         2.36                 hf3e587d_1    conda-forge
bioconductor-affy         1.72.0            r41hd029910_0    bioconda
bioconductor-affyio       1.64.0            r41hd029910_0    bioconda
bioconductor-all          1.36.0            r41hdfd78af_0    bioconda
bioconductor-annotate     1.72.0            r41hdfd78af_0    bioconda
bioconductor-annotationdbi 1.56.1            r41hdfd78af_0    bioconda
bioconductor-annotationfilter 1.18.0            r41hdfd78af_0    bioconda
bioconductor-annotationhub 3.2.0             r41hdfd78af_0    bioconda
bioconductor-apeglm       1.16.0            r41h399db7b_0    bioconda
bioconductor-beachmat     2.10.0            r41h399db7b_0    bioconda
bioconductor-biobase      2.54.0            r41hd029910_0    bioconda
bioconductor-biocfilecache 2.2.0             r41hdfd78af_0    bioconda
bioconductor-biocgenerics 0.40.0            r41hdfd78af_0    bioconda
bioconductor-biocio       1.4.0             r41hdfd78af_0    bioconda
bioconductor-biocparallel 1.28.0            r41h399db7b_0    bioconda
bioconductor-biocstyle    2.22.0            r41hdfd78af_0    bioconda
bioconductor-biocversion  3.14.0            r41hdfd78af_0    bioconda
bioconductor-biomart      2.50.0            r41hdfd78af_0    bioconda
bioconductor-biostrings   2.62.0            r41hd029910_0    bioconda
bioconductor-clusterprofiler 4.2.0             r41hdfd78af_0    bioconda
bioconductor-complexheatmap 2.10.0            r41hdfd78af_0    bioconda
bioconductor-consensusclusterplus 1.58.0            r41hdfd78af_0    bioconda
bioconductor-degreport    1.30.0            r41hdfd78af_0    bioconda
bioconductor-delayedarray 0.20.0            r41hd029910_0    bioconda
bioconductor-delayedmatrixstats 1.16.0            r41hdfd78af_0    bioconda
bioconductor-deseq2       1.34.0            r41h399db7b_0    bioconda
bioconductor-do.db        2.9                           0    bioconda
bioconductor-dose         3.20.0            r41hdfd78af_0    bioconda
bioconductor-dropletutils 1.14.0            r41h399db7b_0    bioconda
bioconductor-edger        3.36.0            r41h399db7b_0    bioconda
bioconductor-enrichplot   1.14.1            r41hdfd78af_0    bioconda
bioconductor-ensdb.hsapiens.v75 2.99.0           r41hdfd78af_11    bioconda
bioconductor-ensembldb    2.18.1            r41hdfd78af_0    bioconda
bioconductor-fgsea        1.20.0            r41h399db7b_0    bioconda
bioconductor-genefilter   1.76.0            r41hba52eb8_0    bioconda
bioconductor-geneplotter  1.72.0            r41hdfd78af_0    bioconda
bioconductor-genomeinfodb 1.30.0            r41hdfd78af_0    bioconda
bioconductor-genomeinfodbdata 1.2.7             r41hdfd78af_0    bioconda
bioconductor-genomicalignments 1.30.0            r41hd029910_0    bioconda
bioconductor-genomicfeatures 1.46.1            r41hdfd78af_0    bioconda
bioconductor-genomicranges 1.46.0            r41hd029910_0    bioconda
bioconductor-ggtree       3.2.0             r41hdfd78af_0    bioconda
bioconductor-go.db        3.14.0            r41hdfd78af_0    bioconda
bioconductor-gosemsim     2.20.0            r41h399db7b_0    bioconda
bioconductor-graph        1.72.0            r41hd029910_0    bioconda
bioconductor-hdf5array    1.22.0            r41ha2fdcc6_1    bioconda
bioconductor-interactivedisplaybase 1.32.0            r41hdfd78af_0    bioconda
bioconductor-iranges      2.28.0            r41hd029910_0    bioconda
bioconductor-kegggraph    1.54.0            r41hdfd78af_0    bioconda
bioconductor-keggrest     1.34.0            r41hdfd78af_0    bioconda
bioconductor-limma        3.50.0            r41hd029910_0    bioconda
bioconductor-matrixgenerics 1.6.0             r41hdfd78af_0    bioconda
bioconductor-org.hs.eg.db 3.14.0            r41hdfd78af_0    bioconda
bioconductor-org.mm.eg.db 3.14.0            r41hdfd78af_0    bioconda
bioconductor-pathview     1.34.0            r41hdfd78af_0    bioconda
bioconductor-preprocesscore 1.56.0            r41hd029910_0    bioconda
bioconductor-protgenerics 1.26.0            r41hdfd78af_0    bioconda
bioconductor-qvalue       2.26.0            r41hdfd78af_0    bioconda
bioconductor-rgraphviz    2.38.0            r41h399db7b_0    bioconda
bioconductor-rhdf5        2.38.0            r41hfe70e90_1    bioconda
bioconductor-rhdf5filters 1.6.0             r41h399db7b_0    bioconda
bioconductor-rhdf5lib     1.16.0            r41hd029910_0    bioconda
bioconductor-rhtslib      1.26.0            r41hd029910_0    bioconda
bioconductor-rsamtools    2.10.0            r41h399db7b_0    bioconda
bioconductor-rtracklayer  1.54.0            r41ha2fdcc6_1    bioconda
bioconductor-s4vectors    0.32.0            r41hd029910_0    bioconda
bioconductor-scuttle      1.4.0             r41h399db7b_0    bioconda
bioconductor-singlecellexperiment 1.16.0            r41hdfd78af_0    bioconda
bioconductor-sparsematrixstats 1.6.0             r41h399db7b_0    bioconda
bioconductor-summarizedexperiment 1.24.0            r41hdfd78af_0    bioconda
bioconductor-treeio       1.18.0            r41hdfd78af_0    bioconda
bioconductor-tximport     1.22.0            r41hdfd78af_0    bioconda
bioconductor-vsn          3.62.0            r41hd029910_0    bioconda
bioconductor-xvector      0.34.0            r41hd029910_0    bioconda
bioconductor-zlibbioc     1.40.0            r41hd029910_0    bioconda
bwidget                   1.9.14               ha770c72_1    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
curl                      7.80.0               h2574ce0_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gcc_impl_linux-64         9.4.0               h03d3576_11    conda-forge
gcc_linux-64              9.4.0                h391b98a_1    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
gfortran_impl_linux-64    9.4.0               h0003116_11    conda-forge
gfortran_linux-64         9.4.0                hf0ab688_1    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gsl                       2.7                  he838d99_0    conda-forge
gxx_impl_linux-64         9.4.0               h03d3576_11    conda-forge
gxx_linux-64              9.4.0                h0316aca_1    conda-forge
harfbuzz                  3.1.1                h83ec7ef_0    conda-forge
icu                       68.2                 h9c3ff4c_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_15    conda-forge
krb5                      1.19.2               hcc1bbae_3    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libcurl                   7.80.0               h2574ce0_0    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-devel_linux-64     9.4.0               hd854feb_11    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libglib                   2.70.0               h174f98d_1    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libsanitizer              9.4.0               h79bfe98_11    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-devel_linux-64  9.4.0               hd854feb_11    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libtiff                   4.3.0                h6f004c6_2    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
pandoc                    2.16.1               h7f98852_0    conda-forge
pango                     1.48.10              h54213e6_2    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pcre2                     10.37                h032f7d1_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
python                    3.10.0          h62f1059_2_cpython    conda-forge
python_abi                3.10                    2_cp310    conda-forge
r                         4.1             r41hd8ed1ab_1004    conda-forge
r-acidbase                0.4.5             r41hdfd78af_0    bioconda
r-acidcli                 0.1.7             r41hdfd78af_0    bioconda
r-acidexperiment          0.2.2             r41hdfd78af_0    bioconda
r-acidgenerics            0.5.20            r41hdfd78af_0    bioconda
r-acidgenomes             0.2.19            r41hdfd78af_0    bioconda
r-acidgsea                0.6.4             r41hdfd78af_0    bioconda
r-acidmarkdown            0.1.4             r41hdfd78af_0    bioconda
r-acidplots               0.3.9             r41hdfd78af_0    bioconda
r-acidplyr                0.1.22            r41hdfd78af_0    bioconda
r-acidsinglecell          0.1.8             r41hdfd78af_0    bioconda
r-ape                     5.5               r41h306847c_0    conda-forge
r-aplot                   0.1.1             r41hc72bb7e_0    conda-forge
r-ashr                    2.2_47            r41h03ef668_1    conda-forge
r-askpass                 1.1               r41hcfec24a_2    conda-forge
r-assertive               0.3_6             r41hc72bb7e_0    conda-forge
r-assertive.base          0.0_9             r41hc72bb7e_0    conda-forge
r-assertive.code          0.0_3             r41hc72bb7e_2    conda-forge
r-assertive.data          0.0_3             r41hc72bb7e_2    conda-forge
r-assertive.data.uk       0.0_2             r41hc72bb7e_2    conda-forge
r-assertive.data.us       0.0_2             r41hc72bb7e_2    conda-forge
r-assertive.datetimes     0.0_3             r41hc72bb7e_0    conda-forge
r-assertive.files         0.0_2           r41hc72bb7e_1003    conda-forge
r-assertive.matrices      0.0_2             r41hc72bb7e_2    conda-forge
r-assertive.models        0.0_2             r41hc72bb7e_2    conda-forge
r-assertive.numbers       0.0_2           r41hc72bb7e_1003    conda-forge
r-assertive.properties    0.0_4           r41hc72bb7e_1003    conda-forge
r-assertive.reflection    0.0_5             r41hc72bb7e_0    conda-forge
r-assertive.sets          0.0_3           r41hc72bb7e_1003    conda-forge
r-assertive.strings       0.0_3           r41hc72bb7e_1003    conda-forge
r-assertive.types         0.0_3           r41hc72bb7e_1004    conda-forge
r-assertthat              0.2.1             r41hc72bb7e_2    conda-forge
r-backports               1.3.0             r41hcfec24a_0    conda-forge
r-base                    4.1.1                hb93adac_1    conda-forge
r-base64enc               0.1_3           r41hcfec24a_1004    conda-forge
r-basejump                0.14.22           r41hdfd78af_0    bioconda
r-bbmle                   1.0.24            r41hc72bb7e_0    conda-forge
r-bcbiobase               0.6.21            r41hdfd78af_1    bioconda
r-bcbiornaseq             0.3.42            r41hdfd78af_0    bioconda
r-bdsmatrix               1.3_4             r41hcfec24a_1    conda-forge
r-bh                      1.75.0_0          r41hc72bb7e_0    conda-forge
r-biocmanager             1.30.16           r41hc72bb7e_0    conda-forge
r-bit                     4.0.4             r41hcfec24a_0    conda-forge
r-bit64                   4.0.5             r41hcfec24a_0    conda-forge
r-bitops                  1.0_7             r41hcfec24a_0    conda-forge
r-blob                    1.2.2             r41hc72bb7e_0    conda-forge
r-bookdown                0.24              r41hc72bb7e_0    conda-forge
r-boot                    1.3_28            r41hc72bb7e_0    conda-forge
r-brio                    1.1.2             r41hcfec24a_0    conda-forge
r-broom                   0.7.10            r41hc72bb7e_0    conda-forge
r-bslib                   0.3.1             r41hc72bb7e_0    conda-forge
r-cachem                  1.0.6             r41hcfec24a_0    conda-forge
r-callr                   3.7.0             r41hc72bb7e_0    conda-forge
r-caret                   6.0_90            r41hcfec24a_0    conda-forge
r-cellranger              1.1.0           r41hc72bb7e_1003    conda-forge
r-circlize                0.4.13            r41hc72bb7e_0    conda-forge
r-class                   7.3_19            r41hcfec24a_0    conda-forge
r-cli                     3.1.0             r41h03ef668_0    conda-forge
r-clipr                   0.7.1             r41hc72bb7e_0    conda-forge
r-clue                    0.3_60            r41hcfec24a_0    conda-forge
r-cluster                 2.1.2             r41h859d828_0    conda-forge
r-coda                    0.19_4            r41hc72bb7e_0    conda-forge
r-codetools               0.2_18            r41hc72bb7e_0    conda-forge
r-colorspace              2.0_2             r41hcfec24a_0    conda-forge
r-commonmark              1.7             r41hcfec24a_1002    conda-forge
r-conquer                 1.2.1             r41h6dc32e9_0    conda-forge
r-cowplot                 1.1.1             r41hc72bb7e_0    conda-forge
r-cpp11                   0.4.1             r41hc72bb7e_0    conda-forge
r-crayon                  1.4.2             r41hc72bb7e_0    conda-forge
r-crosstalk               1.2.0             r41hc72bb7e_0    conda-forge
r-curl                    4.3.2             r41hcfec24a_0    conda-forge
r-data.table              1.14.2            r41hcfec24a_0    conda-forge
r-dbi                     1.1.1             r41hc72bb7e_0    conda-forge
r-dbplyr                  2.1.1             r41hc72bb7e_0    conda-forge
r-desc                    1.4.0             r41hc72bb7e_0    conda-forge
r-deseqanalysis           0.4.4             r41hdfd78af_0    bioconda
r-diffobj                 0.3.5             r41hcfec24a_0    conda-forge
r-digest                  0.6.28            r41h03ef668_0    conda-forge
r-doparallel              1.0.16            r41hc72bb7e_0    conda-forge
r-downloader              0.4             r41hc72bb7e_1003    conda-forge
r-dplyr                   1.0.7             r41h03ef668_0    conda-forge
r-dqrng                   0.3.0             r41h03ef668_0    conda-forge
r-dt                      0.19              r41hc72bb7e_0    conda-forge
r-e1071                   1.7_9             r41h03ef668_0    conda-forge
r-ellipsis                0.3.2             r41hcfec24a_0    conda-forge
r-emdbook                 1.3.12            r41hc72bb7e_1    conda-forge
r-etrunct                 0.1             r41ha770c72_1002    conda-forge
r-evaluate                0.14              r41hc72bb7e_2    conda-forge
r-fansi                   0.4.2             r41hcfec24a_0    conda-forge
r-farver                  2.1.0             r41h03ef668_0    conda-forge
r-fastmap                 1.1.0             r41h03ef668_0    conda-forge
r-fastmatch               1.1_3             r41hcfec24a_0    conda-forge
r-filelock                1.0.2           r41hcfec24a_1002    conda-forge
r-fontawesome             0.2.2             r41hc72bb7e_0    conda-forge
r-forcats                 0.5.1             r41hc72bb7e_0    conda-forge
r-foreach                 1.5.1             r41hc72bb7e_0    conda-forge
r-foreign                 0.8_81            r41hcfec24a_0    conda-forge
r-formatr                 1.11              r41hc72bb7e_0    conda-forge
r-fs                      1.5.0             r41h03ef668_0    conda-forge
r-futile.logger           1.4.3           r41hc72bb7e_1003    conda-forge
r-futile.options          1.0.1           r41hc72bb7e_1002    conda-forge
r-future                  1.23.0            r41hc72bb7e_0    conda-forge
r-future.apply            1.8.1             r41hc72bb7e_0    conda-forge
r-generics                0.1.1             r41hc72bb7e_0    conda-forge
r-getoptlong              1.0.5             r41hc72bb7e_0    conda-forge
r-ggdendro                0.1.22            r41hc72bb7e_0    conda-forge
r-ggforce                 0.3.3             r41h03ef668_0    conda-forge
r-ggfun                   0.0.4             r41hc72bb7e_0    conda-forge
r-ggnewscale              0.4.5             r41hc72bb7e_0    conda-forge
r-ggplot2                 3.3.5             r41hc72bb7e_0    conda-forge
r-ggplotify               0.1.0             r41hc72bb7e_0    conda-forge
r-ggpmisc                 0.4.4             r41hc72bb7e_0    conda-forge
r-ggpp                    0.4.2             r41hc72bb7e_0    conda-forge
r-ggraph                  2.0.5             r41h03ef668_0    conda-forge
r-ggrepel                 0.9.1             r41h03ef668_0    conda-forge
r-ggridges                0.5.3             r41hc72bb7e_0    conda-forge
r-globaloptions           0.1.2             r41ha770c72_0    conda-forge
r-globals                 0.14.0            r41hc72bb7e_0    conda-forge
r-glue                    1.5.0             r41hcfec24a_0    conda-forge
r-goalie                  0.5.5             r41hdfd78af_0    bioconda
r-gower                   0.2.2             r41hcfec24a_0    conda-forge
r-graphlayouts            0.7.1             r41h03ef668_0    conda-forge
r-gridextra               2.3             r41hc72bb7e_1003    conda-forge
r-gridgraphics            0.5_1             r41hc72bb7e_0    conda-forge
r-gtable                  0.3.0             r41hc72bb7e_3    conda-forge
r-haven                   2.4.3             r41h2713e49_0    conda-forge
r-hexbin                  1.28.2            r41h859d828_0    conda-forge
r-highr                   0.9               r41hc72bb7e_0    conda-forge
r-hms                     1.1.1             r41hc72bb7e_0    conda-forge
r-htmltools               0.5.2             r41h03ef668_0    conda-forge
r-htmlwidgets             1.5.4             r41hc72bb7e_0    conda-forge
r-httpuv                  1.6.3             r41h03ef668_0    conda-forge
r-httr                    1.4.2             r41hc72bb7e_0    conda-forge
r-igraph                  1.2.8             r41he0372cf_0    conda-forge
r-invgamma                1.1               r41hc72bb7e_1    conda-forge
r-ipred                   0.9_12            r41hcfec24a_0    conda-forge
r-irlba                   2.3.3             r41he454529_3    conda-forge
r-isoband                 0.2.5             r41h03ef668_0    conda-forge
r-iterators               1.0.13            r41hc72bb7e_0    conda-forge
r-jquerylib               0.1.4             r41hc72bb7e_0    conda-forge
r-jsonlite                1.7.2             r41hcfec24a_0    conda-forge
r-kernsmooth              2.23_20           r41h742201e_0    conda-forge
r-knitr                   1.35              r41hc72bb7e_0    conda-forge
r-labeling                0.4.2             r41hc72bb7e_0    conda-forge
r-lambda.r                1.2.4             r41hc72bb7e_1    conda-forge
r-lasso2                  1.2_22            r41hcfec24a_0    conda-forge
r-later                   1.2.0             r41h03ef668_0    conda-forge
r-lattice                 0.20_45           r41hcfec24a_0    conda-forge
r-lava                    1.6.10            r41hc72bb7e_0    conda-forge
r-lazyeval                0.2.2             r41hcfec24a_2    conda-forge
r-lifecycle               1.0.1             r41hc72bb7e_0    conda-forge
r-listenv                 0.8.0             r41hc72bb7e_1    conda-forge
r-lmodel2                 1.7_3             r41hc72bb7e_0    conda-forge
r-locfit                  1.5_9.4           r41hcfec24a_1    conda-forge
r-logging                 0.10_108          r41ha770c72_2    conda-forge
r-lubridate               1.8.0             r41h03ef668_0    conda-forge
r-magrittr                2.0.1             r41hcfec24a_1    conda-forge
r-markdown                1.1               r41hcfec24a_1    conda-forge
r-mass                    7.3_54            r41hcfec24a_0    conda-forge
r-matrix                  1.3_4             r41he454529_0    conda-forge
r-matrixmodels            0.5_0             r41hc72bb7e_0    conda-forge
r-matrixstats             0.61.0            r41hcfec24a_0    conda-forge
r-memoise                 2.0.0             r41hc72bb7e_0    conda-forge
r-mgcv                    1.8_38            r41he454529_0    conda-forge
r-mime                    0.12              r41hcfec24a_0    conda-forge
r-mixsqp                  0.3_43            r41h306847c_1    conda-forge
r-mnormt                  2.0.2             r41h859d828_0    conda-forge
r-modelmetrics            1.2.2.2           r41h03ef668_1    conda-forge
r-munsell                 0.5.0           r41hc72bb7e_1003    conda-forge
r-mvtnorm                 1.1_3             r41h859d828_0    conda-forge
r-nlme                    3.1_153           r41h859d828_0    conda-forge
r-nnet                    7.3_16            r41hcfec24a_0    conda-forge
r-nozzle.r1               1.1_1           r41ha770c72_1003    conda-forge
r-numderiv                2016.8_1.1        r41hc72bb7e_3    conda-forge
r-openssl                 1.4.5             r41he36bf35_1    conda-forge
r-openxlsx                4.2.4             r41h03ef668_0    conda-forge
r-parallelly              1.28.1            r41hc72bb7e_0    conda-forge
r-patchwork               1.1.1             r41hc72bb7e_0    conda-forge
r-pheatmap                1.0.12            r41hc72bb7e_2    conda-forge
r-pillar                  1.6.4             r41hc72bb7e_0    conda-forge
r-pipette                 0.7.2             r41hdfd78af_0    bioconda
r-pkgconfig               2.0.3             r41hc72bb7e_1    conda-forge
r-pkgload                 1.2.3             r41h03ef668_0    conda-forge
r-plogr                   0.2.0           r41hc72bb7e_1003    conda-forge
r-plyr                    1.8.6             r41h03ef668_1    conda-forge
r-png                     0.1_7           r41hcfec24a_1004    conda-forge
r-polyclip                1.10_0            r41h03ef668_2    conda-forge
r-polynom                 1.4_0             r41hc72bb7e_2    conda-forge
r-praise                  1.0.0           r41hc72bb7e_1004    conda-forge
r-prettyunits             1.1.1             r41hc72bb7e_1    conda-forge
r-proc                    1.18.0            r41h03ef668_0    conda-forge
r-processx                3.5.2             r41hcfec24a_0    conda-forge
r-prodlim                 2019.11.13        r41h03ef668_1    conda-forge
r-progress                1.2.2             r41hc72bb7e_2    conda-forge
r-progressr               0.9.0             r41hc72bb7e_0    conda-forge
r-promises                1.2.0.1           r41h03ef668_0    conda-forge
r-proxy                   0.4_26            r41hcfec24a_0    conda-forge
r-ps                      1.6.0             r41hcfec24a_0    conda-forge
r-psych                   2.1.9             r41hc72bb7e_0    conda-forge
r-purrr                   0.3.4             r41hcfec24a_1    conda-forge
r-pzfx                    0.3.0             r41hc72bb7e_0    conda-forge
r-quantreg                5.86              r41h52d45c5_0    conda-forge
r-r.methodss3             1.8.1             r41hc72bb7e_0    conda-forge
r-r.oo                    1.24.0            r41hc72bb7e_0    conda-forge
r-r.utils                 2.11.0            r41hc72bb7e_0    conda-forge
r-r6                      2.5.1             r41hc72bb7e_0    conda-forge
r-rappdirs                0.3.3             r41hcfec24a_0    conda-forge
r-rcolorbrewer            1.1_2           r41h785f33e_1003    conda-forge
r-rcpp                    1.0.7             r41h03ef668_0    conda-forge
r-rcpparmadillo           0.10.7.3.0        r41h306847c_0    conda-forge
r-rcppeigen               0.3.3.9.1         r41h306847c_0    conda-forge
r-rcppnumerical           0.4_0             r41h03ef668_1    conda-forge
r-rcurl                   1.98_1.5          r41hcfec24a_0    conda-forge
r-rdrop2                  0.8.2.1           r41hc72bb7e_0    conda-forge
r-readr                   2.0.2             r41h03ef668_0    conda-forge
r-readxl                  1.3.1             r41h2713e49_4    conda-forge
r-recipes                 0.1.17            r41hc72bb7e_0    conda-forge
r-recommended             4.1             r41hd8ed1ab_1004    conda-forge
r-rematch                 1.0.1           r41hc72bb7e_1003    conda-forge
r-rematch2                2.1.2             r41hc72bb7e_1    conda-forge
r-reshape                 0.8.8             r41hcfec24a_2    conda-forge
r-reshape2                1.4.4             r41h03ef668_1    conda-forge
r-restfulr                0.0.13            r41hdf9a8c9_1    bioconda
r-rio                     0.5.27            r41hc72bb7e_0    conda-forge
r-rjson                   0.2.20          r41h03ef668_1002    conda-forge
r-rlang                   0.4.12            r41hcfec24a_0    conda-forge
r-rmarkdown               2.11              r41hc72bb7e_0    conda-forge
r-rpart                   4.1_15            r41hcfec24a_2    conda-forge
r-rprojroot               2.0.2             r41hc72bb7e_0    conda-forge
r-rsqlite                 2.2.8             r41h03ef668_0    conda-forge
r-rstudioapi              0.13              r41hc72bb7e_0    conda-forge
r-rvcheck                 0.1.8             r41hc72bb7e_1    conda-forge
r-sass                    0.4.0             r41h03ef668_0    conda-forge
r-scales                  1.1.1             r41hc72bb7e_0    conda-forge
r-scatterpie              0.1.6             r41hc72bb7e_0    conda-forge
r-sessioninfo             1.2.1             r41hc72bb7e_0    conda-forge
r-shadowtext              0.0.9             r41hc72bb7e_0    conda-forge
r-shape                   1.4.6             r41ha770c72_0    conda-forge
r-shiny                   1.7.1             r41h785f33e_0    conda-forge
r-sitmo                   2.0.2             r41h03ef668_0    conda-forge
r-snow                    0.4_4             r41hc72bb7e_0    conda-forge
r-sourcetools             0.1.7           r41h9c3ff4c_1002    conda-forge
r-sparsem                 1.81              r41h859d828_0    conda-forge
r-spatial                 7.3_14            r41hcfec24a_0    conda-forge
r-splus2r                 1.3_3             r41h859d828_0    conda-forge
r-squarem                 2021.1            r41hc72bb7e_0    conda-forge
r-stringi                 1.7.5             r41hcabe038_0    conda-forge
r-stringr                 1.4.0             r41hc72bb7e_2    conda-forge
r-survival                3.2_13            r41hcfec24a_0    conda-forge
r-syntactic               0.5.0             r41hdfd78af_0    bioconda
r-sys                     3.4               r41hcfec24a_0    conda-forge
r-testthat                3.1.0             r41h03ef668_0    conda-forge
r-tibble                  3.1.6             r41hcfec24a_0    conda-forge
r-tidygraph               1.2.0             r41h03ef668_0    conda-forge
r-tidyr                   1.1.4             r41h03ef668_0    conda-forge
r-tidyselect              1.1.1             r41hc72bb7e_0    conda-forge
r-tidytree                0.3.5             r41hc72bb7e_0    conda-forge
r-timedate                3043.102        r41hc72bb7e_1002    conda-forge
r-tinytex                 0.35              r41hc72bb7e_0    conda-forge
r-tmvnsim                 1.0_2             r41h859d828_3    conda-forge
r-truncnorm               1.0_8           r41hcfec24a_1002    conda-forge
r-tweenr                  1.0.2             r41h03ef668_0    conda-forge
r-tzdb                    0.2.0             r41h03ef668_0    conda-forge
r-upsetr                  1.4.0             r41hc72bb7e_2    conda-forge
r-utf8                    1.2.2             r41hcfec24a_0    conda-forge
r-vctrs                   0.3.8             r41hcfec24a_1    conda-forge
r-viridis                 0.6.2             r41hc72bb7e_0    conda-forge
r-viridislite             0.4.0             r41hc72bb7e_0    conda-forge
r-vroom                   1.5.6             r41h03ef668_0    conda-forge
r-waldo                   0.3.1             r41hc72bb7e_0    conda-forge
r-withr                   2.4.2             r41hc72bb7e_0    conda-forge
r-xfun                    0.28              r41h03ef668_0    conda-forge
r-xml                     3.99_0.8          r41hcfec24a_0    conda-forge
r-xml2                    1.3.2             r41h03ef668_1    conda-forge
r-xtable                  1.8_4             r41hc72bb7e_3    conda-forge
r-xts                     0.12.1            r41hcfec24a_0    conda-forge
r-yaml                    2.2.1             r41hcfec24a_1    conda-forge
r-yulab.utils             0.0.4             r41hc72bb7e_0    conda-forge
r-zip                     2.2.0             r41hcfec24a_0    conda-forge
r-zoo                     1.8_9             r41hcfec24a_1    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
sed                       4.8                  he412f7d_0    conda-forge
setuptools                58.5.3          py310hff52083_0    conda-forge
sqlite                    3.36.0               h9cd32fc_2    conda-forge
sysroot_linux-64          2.12                he073ed8_15    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
tktable                   2.10                 hb7b940f_3    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge
mjsteinbaugh commented 2 years ago

@naumenko-sa Where's flatFiles() being called? I'm not seeing this in the bcbio-nextgen source code. Try renaming that to coerceToList() instead -- flatFiles() was made defunct and then later removed in basejump because the function returned an unstructured list from an S4 object, but not actual "flat files" on disk.

mjsteinbaugh commented 2 years ago

Ah nevermind got it, it's here: https://github.com/bcbio/bcbio-nextgen/blob/71c6f97b552ee45a3e3b1d36f675c477a233fcd2/bcbio/rnaseq/bcbiornaseq.py#L128

Yeah rename flatFiles() to coerceToList() instead, and that should fix it. I can push an update to basejump that keeps this deprecated again for the time being.

naumenko-sa commented 2 years ago

Thanks @mjsteinbaugh !

I've fixed that and rda -> rds

The next issue is:

subprocess.CalledProcessError: Command 'bcbio_devel/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla -e load("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/2_bulk_rnaseq/seqc/final/bcbioRNASeq/data/bcb.rds");date=format(Sys.time(), "%Y-%m-%d");dir="/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/2_bulk_rnaseq/seqc/final/bcbioRNASeq/data/../results/2021-12-02/gene/counts";library(tidyverse);library(bcbioRNASeq);counts = bcbioRNASeq::counts(bcb) %>% as.data.frame() %>% round() %>% tibble::rownames_to_column("gene");metadata = colData(bcb) %>% as.data.frame() %>% tibble::rownames_to_column("sample");readr::write_csv(counts, file.path(dir, "counts.csv.gz"));readr::write_csv(metadata, file.path(dir, "metadata.csv.gz"));
Error in load("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/2_bulk_rnaseq/seqc/final/bcbioRNASeq/data/bcb.rds") : 
  bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning messages:
1: In readChar(con, 5L, useBytes = TRUE) :
  truncating string with embedded nuls
2: file ‘bcb.rds’ has magic number 'X'
  Use of save versions prior to 2 is deprecated 
Execution halted
' returned non-zero exit status 1.

Could you please take a look?

Sergey

mjsteinbaugh commented 2 years ago

Yeah for RDS files, use object <- readRDS("bcb.rds"). R data serialized (RDS) files are saved without the object name (e.g. "bcb"), which is preferable, but requires assignment into the current working environment. That error in base R is too vague and needs to be improved in a future update, suggesting readRDS instead of load.

mjsteinbaugh commented 2 years ago

Another minor thing is that R 4.1 adds support for a native pipe |>, which is more performant for large objects than the magrittr pipe %>%. We may want to switch to this in the bcbio code.

naumenko-sa commented 2 years ago

Thanks @mjsteinbaugh ! Almost there!

With these changes + tools_on: keep_gene_version which keeps transcript versions in tx2gene: https://github.com/bcbio/bcbio-nextgen/pull/3568

ENST00000456328.2,ENSG00000223972
ENST00000450305.2,ENSG00000223972
ENST00000488147.1,ENSG00000227232
ENST00000619216.1,ENSG00000278267
ENST00000473358.1,ENSG00000243485
ENST00000469289.1,ENSG00000243485
ENST00000607096.1,ENSG00000284332
ENST00000417324.1,ENSG00000237613
ENST00000461467.1,ENSG00000237613
ENST00000606857.1,ENSG00000268020

I am getting:

→ Importing '/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/2021-12-02_seqc/bcbio-nextgen-commands.log' using base::`readLines()`.
🧪 ## Sample metadata
→ Getting sample metadata from YAML.
Loading a subset of samples:
• HBRR_rep1
• HBRR_rep2
• HBRR_rep3
• UHRR_rep1
• UHRR_rep2
• UHRR_rep3
→ Getting sample quality control metrics from YAML.
🧪 ## Counts
🧪 ### tximport
→ Importing '/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/2021-12-02_seqc/tx2gene.csv' using data.table::`fread()`.
→ Importing salmon transcript-level counts from 'quant.sf' files using tximport 1.22.0.
countsFromAbundance: lengthScaledTPM
txOut: TRUE
reading in files with read_tsv
1 2 3 4 5 6 
Error in .isTximportReturn(txi) : Assert failure.
[2] identical(rownames(infReps[[1L]]), rownames(abundance)) is not TRUE.
Calls: bcbioRNASeq -> .tximport -> assert -> .isTximportReturn -> assert
Execution halted
' returned non-zero exit status 1.

The sample sheet:

samplename,description,category
UHRR_rep1,UHRR_rep1,UHRR
HBRR_rep1,HBRR_rep1,HBRR
UHRR_rep2,UHRR_rep2,UHRR
HBRR_rep2,HBRR_rep2,HBRR
UHRR_rep3,UHRR_rep3,UHRR
HBRR_rep3,HBRR_rep3,HBRR

The yaml template:

details:
  - analysis: RNA-seq
    genome_build: hg38
    algorithm:
      quality_format: standard
      aligner: false
      strandedness: unstranded
      tools_on:
      - bcbiornaseq
      - keep_gene_version
      bcbiornaseq:
        organism: homo sapiens
        interesting_groups: category
upload:
  dir: ../final
resources:
  star:
    cores: 10
    memory: 10G

The basic tximport companion works ok: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/scripts/R/bcbio2se.R

Sergey

mjsteinbaugh commented 2 years ago

OK cool I'll take a look and see if we need to publish any fixes in the package

mjsteinbaugh commented 2 years ago

@naumenko-sa Hi Sergey, following up on this, I'm working on a code update this week and will ping you back soon.

amizeranschi commented 2 years ago

@mjsteinbaugh @naumenko-sa

Thanks a lot for looking into this. Please let me know if I can do anything to help with testing.

naumenko-sa commented 2 years ago

@mjsteinbaugh sorry for bugging, any luck with the update? We need to release bcbio1.2.9 this week, I'd be happy to include the updated bcbioRNAseq rather than to pin the r35 version.

mjsteinbaugh commented 2 years ago

@naumenko-sa Yep totally, I'll work on fixing it this week ASAP. What's your timeline for the 1.2.9 release?

naumenko-sa commented 2 years ago

thanks Michael! Honestly, this issue is the main release blocker for now - we have fixed PureCN, snpeff5.0, picard which were also blocking issues. Releasing 1.2.9 tomorrow or Wed would be ideal to give users some time for the post-release testing before the NY.

mjsteinbaugh commented 2 years ago

OK I'll work on fixing this today

mjsteinbaugh commented 2 years ago

Can you send me a copy of the example data from /n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final above? That will be easier to test locally

naumenko-sa commented 2 years ago

Sure, uploading them here: https://www.dropbox.com/sh/w9ogvhbeqirluq4/AAB-YpjkbhgUP8YHpZOfV9sTa?dl=0 should be 6 files in ~ 30 min

mjsteinbaugh commented 2 years ago

What's the R code that gets passed in the bcbioRNASeq() function call? I'm having difficulty reproducing this with any test dataset

mjsteinbaugh commented 2 years ago

Ah OK think I may have it -- seems to be a situation where level = "genes" is working but level = "transcripts" is not working as expected. I'll dig into this further

mjsteinbaugh commented 2 years ago

Here's the error with more verbosity, I'm working on a version bump that will fix this:

→ Importing salmon transcript-level counts from quant.sf files using tximport 1.22.0.
countsFromAbundance: lengthScaledTPM
txOut: TRUE
reading in files with read_tsv
1 2 3 4 5 6 
Error: Assert failure.
[2] identical(rownames(infReps[[1L]]),
rownames(abundance)) is not TRUE.
Backtrace:
    █
 1. └─bcbioRNASeq::bcbioRNASeq(uploadDir, level = "transcripts")
 2.   └─bcbioRNASeq::.tximport(...) R/AllGenerators.R:397:8
 3.     ├─goalie::assert(.isTximportReturn(txi)) R/internal-tximport.R:110:4
 4.     └─bcbioRNASeq::.isTximportReturn(txi) R/internal-tximport.R:110:4
 5.       └─goalie::assert(...) R/internal-tximport.R:145:8
 6.         └─AcidCLI:::stop(...)
 7.           └─cli::cli_abort(x)
naumenko-sa commented 2 years ago

Thanks Michael! The upload is done!

mjsteinbaugh commented 2 years ago

Cool I think I have a working fix, will push an update to GitHub soon

mjsteinbaugh commented 2 years ago

I'm working on some additional improvements to the package that we can table for a later release...this fix should work so you can push the bcbio-nextgen 1.2.9 update

mjsteinbaugh commented 2 years ago

OK I think bcbioRNASeq v0.3.43 should fix this issue. I'm working on updating on bioconda.

naumenko-sa commented 2 years ago

Thanks Michael for the quick fix, we are almost there!

I confirm that after the manual update in anaconda/envs/rbcbiornaseq/bin/R with

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
install.packages(
    pkgs = "bcbioRNASeq",
    repos = c(
        "https://r.acidgenomics.com",
        BiocManager::repositories()
    )
)

It passes the previous break point. It fails then at https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/bcbiornaseq.py#L110 with:

subprocess.CalledProcessError: Command '/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla -e rmarkdown::draft("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/bcbioRNASeq/quality_control.Rmd", template="quality_control", package="bcbioRNASeq", edit=FALSE)
Error in rmarkdown::draft("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/bcbioRNASeq/quality_control.Rmd",  : 
  The template 'quality_control' was not found in the bcbioRNASeq package

Could you please take a look? Sergey

mjsteinbaugh commented 2 years ago

Ah OK thanks, I'll take a look. I'm working on the r-bcbiornaseq bioconda recipe update here: https://github.com/bioconda/bioconda-recipes/pull/31978

mjsteinbaugh commented 2 years ago

To resolve this quality_control template error, I think it requires changing to 01-quality-control, see inst/rmarkdown/templates here: https://github.com/hbc/bcbioRNASeq/tree/master/inst/rmarkdown/templates

naumenko-sa commented 2 years ago

I've fixed the template path, now I am getting this error:

[2021-12-13T22:49Z] Creating bcbioRNASeq quality control template.
[2021-12-13T22:49Z] Editing bcbioRNAseq quality control template.
[2021-12-13T22:49Z] Rendering bcbioRNASeq quality control report.
[2021-12-13T22:49Z] processing file: quality_control.Rmd
  |..                                                                    |   2%
[2021-12-13T22:49Z]    inline R code fragments
  |...                                                                   |   5%
[2021-12-13T22:49Z] label: setup (with options)
[2021-12-13T22:49Z] List of 2
[2021-12-13T22:49Z]  $ cache  : logi FALSE
[2021-12-13T22:49Z]  $ message: logi FALSE
  |.....                                                                 |   7%
[2021-12-13T22:49Z]   ordinary text without R code
  |......                                                                |   9%
[2021-12-13T22:49Z] label: header (with options)
[2021-12-13T22:49Z] List of 1
[2021-12-13T22:49Z]  $ child: chr "_header.Rmd"
[2021-12-13T22:49Z] processing file: ./_header.Rmd
  |......................................................................| 100%
[2021-12-13T22:49Z]   ordinary text without R code
  |........                                                              |  11%
[2021-12-13T22:49Z]   ordinary text without R code
  |..........                                                            |  14%
[2021-12-13T22:49Z] label: load-object
[2021-12-13T22:49Z] Quitting from lines 39-49 (quality_control.Rmd)
[2021-12-13T22:49Z] Error in .local(file, ...) : Assert failure.
[2021-12-13T22:49Z] [1] isAFile(file) || isAURL(file) is not TRUE.
[2021-12-13T22:49Z] Calls: <Anonymous> ... eval -> eval -> import -> import -> .local -> assert
[2021-12-13T22:49Z] Execution halted
[2021-12-13T22:49Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla -e rmarkdown::render("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/bcbioRNASeq/quality_control.Rmd")
processing file: quality_control.Rmd
  |..                                                                    |   2%
   inline R code fragments
  |...                                                                   |   5%
label: setup (with options) 
List of 2
 $ cache  : logi FALSE
 $ message: logi FALSE
  |.....                                                                 |   7%
  ordinary text without R code
  |......                                                                |   9%
label: header (with options) 
List of 1
 $ child: chr "_header.Rmd"
processing file: ./_header.Rmd
  |......................................................................| 100%
  ordinary text without R code
  |........                                                              |  11%
  ordinary text without R code
  |..........                                                            |  14%
label: load-object
Quitting from lines 39-49 (quality_control.Rmd) 
Error in .local(file, ...) : Assert failure.
[1] isAFile(file) || isAURL(file) is not TRUE.
Calls: <Anonymous> ... eval -> eval -> import -> import -> .local -> assert
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/bin/bcbio_nextgen.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen==1.2.9a0', 'bcbio_nextgen.py')
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/pkg_resources/__init__.py", line 651, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1448, in run_script
    exec(code, namespace, namespace)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/EGG-INFO/scripts/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/EGG-INFO/scripts/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/pipeline/main.py", line 290, in rnaseqpipeline
    run_parallel("run_bcbiornaseqload", [sample])
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/distributed/multitasks.py", line 92, in run_bcbiornaseqload
    return bcbiornaseq.make_bcbiornaseq_object(*args)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/rnaseq/bcbiornaseq.py", line 41, in make_bcbiornaseq_object
    make_quality_report(data)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/rnaseq/bcbiornaseq.py", line 65, in make_quality_report
    render_rmarkdown_file(quality_rmd)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/rnaseq/bcbiornaseq.py", line 110, in render_rmarkdown_file
    do.run([rcmd, "--vanilla", "-e", render_string], "Rendering bcbioRNASeq quality control report.")
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/lib/python3.7/site-packages/bcbio_nextgen-1.2.9a0-py3.7.egg/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla -e rmarkdown::render("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/bcbioRNASeq/quality_control.Rmd")
processing file: quality_control.Rmd
  |..                                                                    |   2%
   inline R code fragments
  |...                                                                   |   5%
label: setup (with options) 
List of 2
 $ cache  : logi FALSE
 $ message: logi FALSE
  |.....                                                                 |   7%
  ordinary text without R code
  |......                                                                |   9%
label: header (with options) 
List of 1
 $ child: chr "_header.Rmd"
processing file: ./_header.Rmd
  |......................................................................| 100%
  ordinary text without R code
  |........                                                              |  11%
  ordinary text without R code
  |..........                                                            |  14%
label: load-object
Quitting from lines 39-49 (quality_control.Rmd) 
Error in .local(file, ...) : Assert failure.
[1] isAFile(file) || isAURL(file) is not TRUE.
Calls: <Anonymous> ... eval -> eval -> import -> import -> .local -> assert
Execution halted
' returned non-zero exit status 1.

Any chance you could run a bcbio run yourself to catch all the issues at once?

Sergey

mjsteinbaugh commented 2 years ago

Yeah I'll work on setting up a new bcbio dev install inside of Docker and test this out tomorrow. I'm also hitting some snags with the bioconda recipe update when attempting to update the R dependencies to R 4.1 / Bioconductor 3.14. We can get this sorted out this week but it will take a little debugging. Thanks!

mjsteinbaugh commented 2 years ago

I think that step is erroring out with the QC template because we need to update the R Markdown params (see here, bcb_file: https://github.com/hbc/bcbioRNASeq/blob/master/inst/rmarkdown/templates/01-quality-control/skeleton/skeleton.Rmd#L11)

mjsteinbaugh commented 2 years ago

Corresponding bcbio-nextgen Python code to render the R Markdown quality control template is here for reference: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/bcbiornaseq.py#L44

naumenko-sa commented 2 years ago

unfortunately, bcbio docker is not working so if you don't have a bcbio intallation at hand, I doubt you could debug bcbio+bcbioRNASeq.

bcbioRNAseq writes the rds into data/bcb.rds, but the template reads it from rds/YYYY-MM-DD (literally YYYY-MM-DD not the actual date). I've fixed the bcbio code to make an extra copy of bcb.rds, and also proposed a change here, please merge: https://github.com/hbc/bcbioRNASeq/pull/177

After successful reading of bcb.rds, I am getting the next error:

subprocess.CalledProcessError: Command '/n/data1/cores/bcbio/naumenko/bcbio_devel/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla -e rmarkdown::render("/n/data1/cores/bcbio/naumenko/_example_bcbio_runs/3_bulk_rnaseq_6samples_chr22_fast/seqc/final/bcbioRNASeq/quality_control.Rmd")
processing file: quality_control.Rmd
  |..                                                                    |   2%
   inline R code fragments
  |...                                                                   |   5%
label: setup (with options) 
List of 2
 $ cache  : logi FALSE
 $ message: logi FALSE
  |.....                                                                 |   7%
  ordinary text without R code
  |......                                                                |   9%
label: header (with options) 
List of 1
 $ child: chr "_header.Rmd"
processing file: ./_header.Rmd
  |......................................................................| 100%
  ordinary text without R code
  |........                                                              |  11%
  ordinary text without R code
  |..........                                                            |  14%
label: load-object
→ Importing 'rds/YYYY-MM-DD/bcb.rds' using base::`readRDS()`.
  |...........                                                           |  16%
   inline R code fragments
  |.............                                                         |  18%
label: sample-data
  |..............                                                        |  20%
  ordinary text without R code
  |................                                                      |  23%
label: plot-total-reads
Quitting from lines 67-68 (quality_control.Rmd) 
Error in .local(object, ...) : Assert failure.
[4] isGGScale(fill, scale = "discrete", aes = "fill", nullOK = TRUE) is not TRUE.
Cause: 'x' is not all of: ScaleDiscrete, Scale, ggproto, gg.
Calls: <Anonymous> ... plotTotalReads -> plotTotalReads -> .local -> assert
Execution halted
naumenko-sa commented 2 years ago

I think it is just because the test sample has <10 mln reads, so I just enclosed it in try and it finishes ok.

@mjsteinbaugh, now I see you took a big role recently: https://mike.steinbaugh.com/ Congratulations on the promotion! You likely have negative amount of time for bcbiornaseq project now?

Thanks for all your work on that! It is an incredible package! I'll try to take it from here (a new release and bioconda recipe). I'll bug you if I get stuck.

SN

naumenko-sa commented 2 years ago

I've pushed Rmd change, created a new tag, and submitted a PR to bioconda: https://github.com/bioconda/bioconda-recipes/pull/31985

If you could facilitate merging it - it would be really appreciated. I would be able to go for a bcbio release then.

mjsteinbaugh commented 2 years ago

@naumenko-sa OK I'm working on the bioconda build this morning https://github.com/bioconda/bioconda-recipes/pull/31978/

naumenko-sa commented 2 years ago

@mjsteinbaugh I see you are reverting r-bcbiornaseq back to r4.0 and bioconductor 3.13.

on the bcbio side: the conda installation of r-bcbiornaseq=0.3.42 + r4.1. + bioconductor3.14 in a separate env went ok, and it worked ok (but the latest small fixes) we already introduced R4.1 native pipes in bcbio code for bcbiornaseq calls. https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/bcbiornaseq.py#L170, so reverting to bioconductor3.13 will break bcbio code.

it is easy to fix, just let me know whether bioconductor 3.13 is the final choice for r-bcbiornaseq=0.3.44

Sorry, I am releasing today, I need a freeze of bcbio code.

mjsteinbaugh commented 2 years ago

Yeah that works, I just pinned the draft bioconda recipe back specifically to R 4.1 / Bioconductor 3.14. I'm just having build timeout issues that we need to resolve with the bioconda team, then should be good to go!

https://github.com/bioconda/bioconda-recipes/pull/31978

naumenko-sa commented 2 years ago

the recipe is merged, thanks so much!

naumenko-sa commented 2 years ago

Unfortunately, after installing from conda r-bcbiornaseq=0.3.44 still breaks when plotting quality control report with a real size dataset (>100 mln reads/sample).

It does not break bcbio, since I protected the call with try, but the report.html is not generated.

the error:

processing file: quality_control.Rmd
  |..                                                                    |   2%
   inline R code fragments
  |...                                                                   |   5%
label: setup (with options) 
List of 2
 $ cache  : logi FALSE
 $ message: logi FALSE
  |.....                                                                 |   7%
  ordinary text without R code
  |......                                                                |   9%
label: header (with options) 
List of 1
 $ child: chr "_header.Rmd"
processing file: ./_header.Rmd
  |......................................................................| 100%
  ordinary text without R code
  |........                                                              |  11%
  ordinary text without R code
  |..........                                                            |  14%
label: load-object
→ Importing 'data/bcb.rds' using base::`readRDS()`.
  |...........                                                           |  16%
   inline R code fragments
  |.............                                                         |  18%
label: sample-data
  |..............                                                        |  20%
  ordinary text without R code
  |................                                                      |  23%
label: plot-total-reads
Quitting from lines 67-68 (quality_control.Rmd) 
Error in .local(object, ...) : Assert failure.
[4] isGGScale(fill, scale = "discrete", aes = "fill", nullOK = TRUE) is not TRUE.
Cause: 'x' is not all of: ScaleDiscrete, Scale, ggproto, gg.
Calls: <Anonymous> ... plotTotalReads -> plotTotalReads -> .local -> assert
Execution halted
' returned non-zero exit status 1.
[2021-12-14T20:44Z] bcbiornaseq error at quality report

I've uploaded bcb.rds and quality_control.Rmd here to debug: https://www.dropbox.com/sh/w9ogvhbeqirluq4/AAB-YpjkbhgUP8YHpZOfV9sTa?dl=0

I am going forward with the bcbio release, it would be nice to fix it without altering bcbio code.

mjsteinbaugh commented 2 years ago

OK I'll look into this and maybe we can do a minor bug fix in bcbioRNASeq to address it

mjsteinbaugh commented 2 years ago

@naumenko-sa OK these issues should be fixed with r-acidmarkdown v0.1.5, which I'm rolling out onto bioconda shortly.

naumenko-sa commented 2 years ago

good job @mjsteinbaugh . I confirm that it works in seqc bcbio test - the report is there. @amizeranschi let us know if that works for you as well!

amizeranschi commented 2 years ago

@naumenko-sa @mjsteinbaugh

Thanks again for all your help so far. After upgrading to the latest development version and getting the sacCer3 data, the RNA-seq analysis progressed further for me, but still ended up crashing.

Let me know if you want me to share a script with everything I'm doing here, in case it could help with reproducing and debugging. Here's the error I'm running into:

[2021-12-18T11:58Z] multiprocessing: upload_samples_project
[2021-12-18T11:58Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen.log
[2021-12-18T11:58Z] multiprocessing: upload_samples_project
[2021-12-18T11:58Z] multiprocessing: upload_samples_project
[2021-12-18T11:58Z] multiprocessing: upload_samples_project
[2021-12-18T11:58Z] Storing in local filesystem: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen.log
[2021-12-18T11:58Z] multiprocessing: upload_samples_project
[2021-12-18T11:58Z] Timing: bcbioRNAseq loading
[2021-12-18T11:58Z] multiprocessing: run_bcbiornaseqload
[2021-12-18T11:58Z] Loading bcbioRNASeq object.
[2021-12-18T11:58Z] Loading required package: basejump
[2021-12-18T11:58Z] Attaching package: ‘basejump’
[2021-12-18T11:58Z] The following objects are masked from ‘package:stats’:
[2021-12-18T11:58Z]     complete.cases, cor, end, median, na.omit, quantile, sd, start, var
[2021-12-18T11:58Z] The following objects are masked from ‘package:utils’:
[2021-12-18T11:58Z]     head, relist, tail
[2021-12-18T11:58Z] The following objects are masked from ‘package:base’:
[2021-12-18T11:58Z]     %in%, anyDuplicated, append, as.factor, as.list, as.matrix,
[2021-12-18T11:58Z]     as.table, basename, cbind, colnames, colnames<-, colSums, dirname,
[2021-12-18T11:58Z]     do.call, duplicated, eval, expand.grid, get, grep, grepl, gsub,
[2021-12-18T11:58Z]     intersect, is.unsorted, lapply, mapply, match, mean, merge, mget,
[2021-12-18T11:58Z]     ncol, nrow, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
[2021-12-18T11:58Z]     rbind, rep.int, rowMeans, rownames, rownames<-, rowSums, sapply,
[2021-12-18T11:58Z]     setdiff, sort, split, sub, subset, summary, t, table, tapply,
[2021-12-18T11:58Z]     union, unique, unsplit, which, which.max, which.min
[2021-12-18T11:58Z] 🧪 # bcbioRNASeq
[2021-12-18T11:58Z] ℹ Importing bcbio-nextgen RNA-seq run.
[2021-12-18T11:58Z] 🧪 ## Run info
[2021-12-18T11:58Z] uploadDir: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final
[2021-12-18T11:58Z] projectDir: 2021-12-18_rna-seq-analysis
[2021-12-18T11:58Z] ℹ 7 samples detected:
[2021-12-18T11:58Z] • AE1
[2021-12-18T11:58Z] • AE2
[2021-12-18T11:58Z] • AE3
[2021-12-18T11:58Z] • bcbioRNASeq
[2021-12-18T11:58Z] • RT1
[2021-12-18T11:58Z] • RT2
[2021-12-18T11:58Z] • RT3
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/project-summary.yaml' using yaml::`yaml.load_file()`.
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/data_versions.csv' using data.table::`fread()`.
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/programs.txt' using data.table::`fread()`.
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen.log' using base::`readLines()`.
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen-commands.log' using base::`readLines()`.
[2021-12-18T11:58Z] 🧪 ## Sample metadata
[2021-12-18T11:58Z] → Getting sample metadata from YAML.
[2021-12-18T11:58Z] Loading a subset of samples:
[2021-12-18T11:58Z] • AE1
[2021-12-18T11:58Z] • AE2
[2021-12-18T11:58Z] • AE3
[2021-12-18T11:58Z] • RT1
[2021-12-18T11:58Z] • RT2
[2021-12-18T11:58Z] • RT3
[2021-12-18T11:58Z] → Getting sample quality control metrics from YAML.
[2021-12-18T11:58Z] 🧪 ## Counts
[2021-12-18T11:58Z] 🧪 ### tximport
[2021-12-18T11:58Z] → Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/tx2gene.csv' using data.table::`fread()`.
[2021-12-18T11:58Z] Error in validObject(.Object) :
[2021-12-18T11:58Z]   invalid class “Tx2Gene” object: Some transcript and gene identifiers are identical.
[2021-12-18T11:58Z] Calls: bcbioRNASeq ... .local -> new -> initialize -> initialize -> validObject
[2021-12-18T11:58Z] Execution halted
[2021-12-18T11:58Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Loading required package: basejump
Attaching package: ‘basejump’
The following objects are masked from ‘package:stats’:
    complete.cases, cor, end, median, na.omit, quantile, sd, start, var
The following objects are masked from ‘package:utils’:
    head, relist, tail
The following objects are masked from ‘package:base’:
    %in%, anyDuplicated, append, as.factor, as.list, as.matrix,
    as.table, basename, cbind, colnames, colnames<-, colSums, dirname,
    do.call, duplicated, eval, expand.grid, get, grep, grepl, gsub,
    intersect, is.unsorted, lapply, mapply, match, mean, merge, mget,
    ncol, nrow, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rep.int, rowMeans, rownames, rownames<-, rowSums, sapply,
    setdiff, sort, split, sub, subset, summary, t, table, tapply,
    union, unique, unsplit, which, which.max, which.min
🧪 # bcbioRNASeq
ℹ Importing bcbio-nextgen RNA-seq run.
🧪 ## Run info
uploadDir: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final
projectDir: 2021-12-18_rna-seq-analysis
ℹ 7 samples detected:
• AE1
• AE2
• AE3
• bcbioRNASeq
• RT1
• RT2
• RT3
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/project-summary.yaml' using yaml::`yaml.load_file()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/data_versions.csv' using data.table::`fread()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/programs.txt' using data.table::`fread()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen.log' using base::`readLines()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen-commands.log' using base::`readLines()`.
🧪 ## Sample metadata
→ Getting sample metadata from YAML.
Loading a subset of samples:
• AE1
• AE2
• AE3
• RT1
• RT2
• RT3
→ Getting sample quality control metrics from YAML.
🧪 ## Counts
🧪 ### tximport
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/tx2gene.csv' using data.table::`fread()`.
Error in validObject(.Object) : 
  invalid class “Tx2Gene” object: Some transcript and gene identifiers are identical.
Calls: bcbioRNASeq ... .local -> new -> initialize -> initialize -> validObject
Execution halted
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 290, in rnaseqpipeline
    run_parallel("run_bcbiornaseqload", [sample])
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 92, in run_bcbiornaseqload
    return bcbiornaseq.make_bcbiornaseq_object(*args)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/rnaseq/bcbiornaseq.py", line 31, in make_bcbiornaseq_object
    do.run([rcmd, "--vanilla", r_file], "Loading bcbioRNASeq object.")
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/home/user/bcbio-nextgen/anaconda/envs/rbcbiornaseq/bin/Rscript --vanilla /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/bcbioRNASeq/load_bcbioRNAseq.R
Loading required package: basejump
Attaching package: ‘basejump’
The following objects are masked from ‘package:stats’:
    complete.cases, cor, end, median, na.omit, quantile, sd, start, var
The following objects are masked from ‘package:utils’:
    head, relist, tail
The following objects are masked from ‘package:base’:
    %in%, anyDuplicated, append, as.factor, as.list, as.matrix,
    as.table, basename, cbind, colnames, colnames<-, colSums, dirname,
    do.call, duplicated, eval, expand.grid, get, grep, grepl, gsub,
    intersect, is.unsorted, lapply, mapply, match, mean, merge, mget,
    ncol, nrow, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rep.int, rowMeans, rownames, rownames<-, rowSums, sapply,
    setdiff, sort, split, sub, subset, summary, t, table, tapply,
    union, unique, unsplit, which, which.max, which.min
🧪 # bcbioRNASeq
ℹ Importing bcbio-nextgen RNA-seq run.
🧪 ## Run info
uploadDir: /home/user/bcbio-runs/rna-seq/rna-seq-analysis/final
projectDir: 2021-12-18_rna-seq-analysis
ℹ 7 samples detected:
• AE1
• AE2
• AE3
• bcbioRNASeq
• RT1
• RT2
• RT3
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/project-summary.yaml' using yaml::`yaml.load_file()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/data_versions.csv' using data.table::`fread()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/programs.txt' using data.table::`fread()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen.log' using base::`readLines()`.
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/bcbio-nextgen-commands.log' using base::`readLines()`.
🧪 ## Sample metadata
→ Getting sample metadata from YAML.
Loading a subset of samples:
• AE1
• AE2
• AE3
• RT1
• RT2
• RT3
→ Getting sample quality control metrics from YAML.
🧪 ## Counts
🧪 ### tximport
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/tx2gene.csv' using data.table::`fread()`.
Error in validObject(.Object) : 
  invalid class “Tx2Gene” object: Some transcript and gene identifiers are identical.
Calls: bcbioRNASeq ... .local -> new -> initialize -> initialize -> validObject
Execution halted
' returned non-zero exit status 1.
mjsteinbaugh commented 2 years ago

Thanks @amizeranschi I see the problem there, and it appears to be specific to the sacCer3 genome:

🧪 ### tximport
→ Importing '/home/user/bcbio-runs/rna-seq/rna-seq-analysis/final/2021-12-18_rna-seq-analysis/tx2gene.csv' using data.table::`fread()`.
Error in validObject(.Object) : 
  invalid class “Tx2Gene” object: Some transcript and gene identifiers are identical.
Calls: bcbioRNASeq ... .local -> new -> initialize -> initialize -> validObject

Can you post a copy of the tx2gene.csv file shown here so I can work on a fix?

The Tx2Gene class and importer is defined in our AcidGenomes package, for reference.

Best, Mike

mjsteinbaugh commented 2 years ago

I also need to add an update to exclude the bcbioRNASeq directory, which is new in the bcbio-nextgen v1.2.9 update:

ℹ 7 samples detected:
• AE1
• AE2
• AE3
• bcbioRNASeq
• RT1
• RT2
• RT3

This should return:

ℹ 6 samples detected:
• AE1
• AE2
• AE3
• RT1
• RT2
• RT3

This is handled by the sampleDirs function in our bcbioBase package.