bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Peddy fails with gvcf data #3616

Closed HudoGriz closed 2 years ago

HudoGriz commented 2 years ago

Hi,

I encountered a similar issue to https://github.com/bcbio/bcbio-nextgen/issues/3506 peddy fails when running tools_on: gvcf. I repeated the run on the same data without tools_on: gvcf and with jointcaller: gatk-haplotype-joint, without specified bulk name (so that it only jointcalls on that sample). Both times peddy successfully finished. My goal is to obtain gvcf data with a peddy analysis, to be latter used in custom downstream pipelines.

Any ideas on solving the problem? Thanks!

Version info

To Reproduce Exact bcbio command I have used

bcbio_nextgen.py ../config/GVCF_analysis.yaml -n 20

My yaml configuration file:

details:
- algorithm:
    aligner: bwa
    archive: cram-lossless
    recalibrate: gatk
    variant_regions: /data/xgen-exome-research-panel-v2-targets-GRCh37-50-flanking.bed
    variantcaller:
    - gatk-haplotype
    tools_on:
    - gvcf

  analysis: variant2
  description: data-wgs
  files:
  - /data/wgs_fastq/data_R1.fastq.gz
  - /data/wgs_fastq/data_R2.fastq.gz
  genome_build: GRCh37
  metadata:
    batch: data_wgs
    phenotype: unknown
    sex: unknown

fc_name: GVCF_analysis
upload:
  dir: ../final_gvcf

bcbio-nextgen.log

[2022-02-09T14:40Z] multiprocessing: run_peddy
[2022-02-09T14:40Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/user/BCBIO/bcbio_dir/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/user/BCBIO/bcbio_dir/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; export LC_ALL=C.UTF-8 && export LANG=C.UTF-8 &&  /home/user/BCBIO/bcbio_dir/anaconda/bin/peddy -p 10  --plot --prefix /data/GVCF_analysis/work/bcbiotx/tmpx4n_kp8v/data_wgs /data/GVCF_analysis/work/gatk-haplotype/data_wgs-effects-annotated-nomissingalt-filterSNP-filterINDEL.vcf.gz /data/GVCF_analysis/work/gatk-haplotype/data_wgs-effects-annotated-nomissingalt-filterSNP-filterINDEL.ped 2> /data/GVCF_analysis/work/bcbiotx/tmpx4n_kp8v/run-stderr.log
' returned non-zero exit status 1.
[2022-02-09T14:40Z] Skipping peddy because no variants overlap with checks: GVCF_analysis_wgs
naumenko-sa commented 2 years ago

Hi @HudoGriz !

Thanks for reporting! It seems that peddy does not support gvcf input. I've introduced a workaround - a vcf file is generated on the fly in that case. Please let me know if works for you (you will need to update to the latest devel code, please also note that your bcbio has python3.6 which is pre-1.2.9).

Sergey

HudoGriz commented 2 years ago

Hi @naumenko-sa ! The fix works fine.

Much obliged,

Blaž