bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Running sample configuration file is giving error #3554

Closed aswani1208 closed 2 years ago

aswani1208 commented 2 years ago

Version info

To Reproduce Exact bcbio command you have used: bcbio_nextgen.py /home/user/Fastq/bcbio/stable/anaconda/bin/undetermined/gatkrun_sample.yaml -n ```

Your sample configuration file:

details:
  - files: [/home/user/Fastq/bcbio/stable/anaconda/bin/Undetermined_S0_L001_R1_001.fastq.gz,/home/user/Fastq/bcbio/stable/anaconda/bin/Undetermined_S0_L001_R2_001.fastq.gz]
    description: Undetermined
    metadata:
      batch: undetermined
    analysis: 'variant2'
    genome_build: hg38
    algorithm:
      aligner: bwa
      variantcaller: [gatk-haplotype]
resources:
  defaults:
    cores: 16
    jvm_opts: [-Xms750m, -Xmx3500m]
    memory: 4G
upload:
  dir: /home/user/Fastq/bcbio/stable/anaconda/bin/undetermined/final

Observed behavior Error message or bcbio output:

global config: /home/user/Fastq/bcbio/stable/anaconda/bin/bcbio_system.yaml
run info config: /home/user/Fastq/bcbio/stable/anaconda/bin/undetermined/gatkrun_sample.yaml
[2021-11-08T10:30Z] System YAML configuration: /home/user/Fastq/bcbio/stable/galaxy/bcbio_system.yaml.
[2021-11-08T10:30Z] Locale set to C.UTF-8.
[2021-11-08T10:30Z] Resource requests: bwa, sambamba, samtools; memory: 4.00, 4.00, 4.00; cores: 16, 16, 16
[2021-11-08T10:30Z] Configuring 1 jobs to run, using 4 cores each with 16.1g of memory reserved for each job
[2021-11-08T10:30Z] Timing: organize samples
[2021-11-08T10:30Z] multiprocessing: organize_samples
[2021-11-08T10:30Z] Using input YAML configuration: /home/user/Fastq/bcbio/stable/anaconda/bin/undetermined/gatkrun_sample.yaml
[2021-11-08T10:30Z] Checking sample YAML configuration: /home/user/Fastq/bcbio/stable/anaconda/bin/undetermined/gatkrun_sample.yaml
Traceback (most recent call last):
  File "./bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "./bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline
    [x[0]["description"] for x in samples]]])
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 459, in organize_samples
    return run_info.organize(*args)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 62, in organize
    is_cwl=is_cwl, integrations=integrations)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 1026, in _run_info_from_yaml
    _check_sample_config(run_details, run_info_yaml, config)
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 804, in _check_sample_config
    [_check_aligner(x) for x in items]
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 804, in <listcomp>
    [_check_aligner(x) for x in items]
  File "/home/user/Fastq/bcbio/stable/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 676, in _check_aligner
    (item["algorithm"].get("aligner"), sorted(list(allowed))))
TypeError: '<' not supported between instances of 'NoneType' and 'bool'

Expected behavior A clear and concise description of what you expected to happen.

Log files Please attach (10MB max): bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

Additional context Add any other context about the problem here.

naumenko-sa commented 2 years ago

Hi @aswani1208 !

Thank for using bcbio and sorry about the issues!

1) Could you please organize your project according to the structure outlined here: https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#project-structure

(And please place it outside of anaconda dir - it is a directory for bcbio code).

2) Have you installed bcbio tools and data, or just bcbio package, i.e. what is the output of which bwa in your bcbio installation. Do you have genomes folder in the bcbio installation?

Sergey

aswani1208 commented 2 years ago

Hi Sergey Thank you for the response. I will organize the project according to the outlined structure for sure. I have used these commands for the installation.

python3 bcbio_nextgen_install.py stable --tooldir=/tools --nodata --mamba bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa

which bwa is returning /home/user/anaconda3/bin/bwa Yes, I have genomes folder in the bcbio installation user@:~/Fastq/bcbio/stable/genomes/Hsapiens/hg38 $ ls bwa config coverage editing rnaseq rtg seq snpeff srnaseq validation variation versions.csv viral

naumenko-sa commented 2 years ago

Hi @aswani1208 !

What was your absolute PATH to the installation dir? /home/user/Fastq/bcbio/stable ? What is in your PATH?

If you installed bcbio to /home/user/Fastq/bcbio/stable your bwa should be in /home/user/Fastq/bcbio/stable/anaconda/bin/bwa rather than in `/home/user/anaconda3/bin/bwa.

You likely have a mixture of conda environments from bcbio and from /home/user/anaconda3. Please see the docs on which conda configs to alter: https://bcbio-nextgen.readthedocs.io/en/latest/contents/development.html?highlight=conda#creating-a-separate-bcbio-installation

Sergey

aswani1208 commented 2 years ago

Hi Sergey Thank you for the help i was able to rectify the error, but again having issues with picard (similar to #3553 )

it is showing File "/home/user/.local/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && export PATH=/home/user/Fastq/bcbio/stable/anaconda/bin:"$PATH" && picard -Xms750m -Xmx3500m -XX:+UseSerialGC CollectSequencingArtifactMetrics -REFERENCE_SEQUENCE /home/user/Fastq/bcbio/stable/genomes/Hsapiens/hg38/seq/hg38.fa -INPUT /home/user/Fastq/bcbio/undetermined/project/work/align/Undetermined/Undetermined-sort.bam -OUTPUT /home/user/Fastq/bcbio/undetermined/project/work/bcbiotx/tmpjylsv17b/Undetermined/Undetermined --VALIDATION_STRINGENCY SILENT /bin/bash: picard: command not found returned non-zero exit status 127.

naumenko-sa commented 2 years ago

Hi @aswani1208 !

Great news - could you please follow #3553 - they report that https://github.com/bcbio/bcbio-nextgen/pull/3556 solves the picard issue.

Sergey