bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Starting tumor calling #1944

Closed bioinfo-dirty-jobs closed 7 years ago

bioinfo-dirty-jobs commented 7 years ago

I try to use the tumor pai analisys. I have this error:


.
├── bcbio_sample.yaml
├── bcbiotx
├── BED
├── checkpoints_parallel
├── configuration
│   └── tumor.test.yaml
├── final
├── log
│   ├── bcbio-nextgen-commands.log
│   ├── bcbio-nextgen-debug.log
│   └── bcbio-nextgen.log
├── NA12878-exome-methodcmp-getdata.sh
├── NA12878-exome-methodcmp.yaml
├── out.txt
├── RAW
│   ├── 411_GCCAAT_L001_R1_001.fastq.gz
│   ├── 411_GCCAAT_L001_R2_001.fastq.gz
│   ├── 412_CTTGTA_L001_R1_001.fastq.gz
│   └── 412_CTTGTA_L001_R2_001.fastq.gz
├── truseq-exome-targeted-regions-manifest-v1-2.bed
└── tumor-paired.yaml
[centos@pol-produ 411]$ bcbio_nextgen.py configuration/tumor.test.yaml  -n 4
[2017-05-23T15:56Z] System YAML configuration: /usr/local/share/bcbio/galaxy/bcbio_system.yaml
[2017-05-23T15:56Z] Resource requests: bwa, sambamba, samtools; memory: 3.00, 3.00, 3.00; cores: 16, 16, 16
[2017-05-23T15:56Z] Configuring 1 jobs to run, using 4 cores each with 12.1g of memory reserved for each job
[2017-05-23T15:56Z] Timing: organize samples
[2017-05-23T15:56Z] multiprocessing: organize_samples
[2017-05-23T15:56Z] Using input YAML configuration: /home/centos/Calling/411/configuration/tumor.test.yaml
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 234, in <module>
    main(**kwargs)
  File "/usr/local/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 42, in run_main
    fc_dir, run_info_yaml)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 86, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 119, in variant2pipeline
    [x[0]["description"] for x in samples]]])
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 350, in organize_samples
    return run_info.organize(*args)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 50, in organize
    integrations=integrations)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 749, in _run_info_from_yaml
    item = _normalize_files(item, dirs.get("flowcell"))
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 668, in _normalize_files
    _sanity_check_files(item, files)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 689, in _sanity_check_files
    raise ValueError("%s for %s: %s" % (msg, item.get("description", ""), files))
ValueError: Expect both fastq files to not be the same for 412-normal: ['/home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz', '/home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz']
details:
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: true
    realign: true
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect, freebayes, vardict, varscan]
    indelcaller: false
    ensemble:
      numpass: 2
    variant_regions: /home/centos/Calling/411/truseq-exome-targeted-regions-manifest-v1-2.bed
    # svcaller: [cnvkit, lumpy, delly]
    # coverage_interval: amplicon
  analysis: variant2
  description: 412-normal
  #files: ../input/synthetic.challenge.set3.normal.bam
  files:
    - /home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz
    - /home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz

  genome_build: GRCh37
  metadata:
    batch: 412
    phenotype: normal
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: true
    realign: true
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect, freebayes, vardict, varscan]
    indelcaller: false
    ensemble:
      numpass: 2
    variant_regions: /home/centos/Calling/411/truseq-exome-targeted-regions-manifest-v1-2.bed
    # coverage_interval: amplicon
  #   svvalidate:
  #     DEL: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DEL.bed
  #     DUP: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DUP.bed
  #     INS: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INS.bed
  #     INV: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INV.bed
  analysis: variant2
  description: 411-tumor
  #files: ../input/synthetic.challenge.set3.tumor.bam
  files:
    - /home/centos/Calling/411/RAW/411_GCCAAT_L001_R1_001.fastq.gz
    - /home/centos/Calling/411/RAW/411_GCCAAT_L001_R2_001.fastq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: tumor
fc_date: '2017-05-23'
fc_name: DGASP-test
upload:
  dir: ../final
chapmanb commented 7 years ago

Thanks for the detailed report and sorry about the issues. This is the informative error:

ValueError: Expect both fastq files to not be the same for 412-normal: ['/home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz', '/home/centos/Calling/411/RAW/412_CTTGTA_L001_R1_001.fastq.gz']

You've specified the same file twice in files in your description, and you probably want R1/R2 for the two files to specify the paired ends. Hope this helps.