bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
980 stars 356 forks source link

Smoove Lumpy error in structural variation pipeline #3633

Open prasundutta87 opened 2 years ago

prasundutta87 commented 2 years ago

Version info

To Reproduce Exact bcbio command you have used:

bcbio_nextgen.py ../config/NA12878-sv.yaml -n 16

Your yaml configuration file:

# Evaluate structural variant calling on NA12878 whole genome data,
# using validated deletions and insertions from the Genome in a Bottle svclassify project
# https://groups.google.com/d/msg/genome-in-a-bottle/v3EDUgZT0Xo/eGYsQlJk2JMJ
#
# See the bcbio-nextgen documentation for full instructions to
# run this analysis:
# https://bcbio-nextgen.readthedocs.org/en/latest/contents/testing.html#example-pipelines
---
details:
- algorithm:
    aligner: bwa
    align_split_size: false
    exclude_regions: [lcr]
    recalibrate: false
    realign: false
    tools_off: [collectsequencingartifacts]
    variantcaller: false
    svcaller: [lumpy, manta, cnvkit, metasv, wham]
    svvalidate:
      DEL: ../input/giab-svclassify-deletions-2015-05-22.bed
      INS: ../input/giab-svclassify-insertions-2015-05-22.bed
  analysis: variant2
  description: NA12878
  files:
  - ../input/NA12878_1_1000000_reads.fastq
  - ../input/NA12878_2_1000000_reads.fastq
  genome_build: hg38
  metadata:
    batch: ceu
    sex: female
upload:
  dir: ../final

Log files (could be found in work/log) Please attach (10MB max): bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log. bcbio-nextgen-commands.log bcbio-nextgen-debug.log

prasundutta87 commented 2 years ago

I just did a little digging in my system to find how many samtools executables are there in the bcbio directory using this command: find . -type f -name "samtools" and this was the output I got:

./bcbio_data/anaconda/bin/samtools ./bcbio_data/anaconda/envs/python2/share/strelka-2.9.10-1/libexec/samtools ./bcbio_data/anaconda/envs/python2/share/manta-1.6.0-1/libexec/samtools ./bcbio_data/anaconda/envs/python2/bin/samtools ./bcbio_data/anaconda/envs/r35/bin/samtools ./bcbio_data/anaconda/envs/htslib1.10/bin/samtools ./bcbio_data/anaconda/envs/python3.6/bin/samtools ./bcbio_data/anaconda/envs/bwakit/bin/samtools ./bcbio_data/anaconda/envs/htslib1.12_py3.9/bin/samtools ./bcbio_data/anaconda/envs/samtools0/bin/samtools ./bcbio_data/anaconda/pkgs/samtools-1.7-1/bin/samtools ./bcbio_data/anaconda/pkgs/manta-1.6.0-h9ee0642_1/share/manta-1.6.0-1/libexec/samtools ./bcbio_data/anaconda/pkgs/samtools-1.15-h1170115_1/bin/samtools ./bcbio_data/anaconda/pkgs/strelka-2.9.10-h9ee0642_1/share/strelka-2.9.10-1/libexec/samtools ./bcbio_data/anaconda/pkgs/samtools-0.1.19-2/bin/samtools ./bcbio_data/anaconda/pkgs/samtools-1.10-h2e538c0_3/bin/samtools

When I executed each one of them, these samtools executables gave various shared libraries errors: ./bcbio_data/anaconda/envs/python2/bin/samtools ./bcbio_data/anaconda/envs/r35/bin/samtools ./bcbio_data/anaconda/envs/python3.6/bin/samtools ./bcbio_data/anaconda/envs/bwakit/bin/samtools ./bcbio_data/anaconda/envs/htslib1.12_py3.9/bin/samtools ./bcbio_data/anaconda/pkgs/samtools-1.7-1/bin/samtools ./bcbio_data/anaconda/pkgs/samtools-1.15-h1170115_1/bin/samtools ./bcbio_data/anaconda/pkgs/samtools-1.10-h2e538c0_3/bin/samtools

I was thinking if the lumpy error is due to samtools shared library errors as I saw quite some samtools errors in the error log and the pipeline began to fail just after that.

prasundutta87 commented 2 years ago

So, the last command which ran was

export TMPDIR=/mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/NA12878-sv-eval/work/bcbiotx/tmpp35dspoe && export PATH=/mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/bcbio_data/anaconda/envs/python2/bin:$PATH && /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/bcbio_data/anaconda/envs/python2/bin/smoove call --processes 16 --genotype --removepr --fasta /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/bcbio_data/genomes/Hsapiens/hg38/seq/hg38.fa --name NA12878-ceu-svs --outdir /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/NA12878-sv-eval/work/bcbiotx/tmpp35dspoe --exclude /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/NA12878-sv-eval/work/structural/NA12878/lumpy/NA12878-ceu-svs-smoove.genotyped-exclude.bed --excludechroms '~^GL,~^HLA,~_random,~^chrUn,~alt,~decoy,chrEBV' /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/NA12878-sv-eval/work/bcbiotx/tmpp35dspoe/NA12878.bam

No wonder, I got samtools error because the samtools present in export PATH=/mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/bcbio_data/anaconda/envs/python2/bin:$PATH, is giving this error: samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

When I just ran the above code, but changed the samtools path to the one which was working, the command ran fine, until it had another error for bcftools: bcftools: symbol lookup error: /mnt/e1000/home/u027/u027/pdutta/bcbio-nextgen/bcbio_data/anaconda/bin/../lib/libgsl.so.25: undefined symbol: cblas_ctrmv.

There are many samtools/bcftools instances installed which different programs access differently. Is there a workaround to tackle such an issue. How can I specify that the pipeline accesses only the software which is working? Actually, the question should be- why are not all the software working and getting this shared library errors? Any workarounds would be greatly appreciated. Is this bcbio v1.2.9 specific?

Regards, Prasun

prasundutta87 commented 2 years ago

After some re-installations/updates, I was able to make both samtools/bcftools run in both default conda and python2 environment. I followed this solution and updated samtools to v1.9: https://github.com/merenlab/anvio/issues/1479

I updated bcftools too and tested to check if both tools ran succesfully in both environments or not.