bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
979 stars 355 forks source link

Realign bam files in bcbio #2541

Closed Anniestasy closed 4 years ago

Anniestasy commented 5 years ago

There was an error: "Input files reference and reads have incompatible contigs: Found contigs with the same name but different lengths: contig reference = MT / 16569 contig reads = MT / 16571." As I understood one of the obvious solutions is to realign bam files. But when I include realignment in .yaml file with script, there was another error: "In sample normal, realign specified but it is not supported for GATK4. Realignment is generally not necessary for most variant callers." Because bcbio uses GATK4 in which realignment option removed. Could you please suggest any solutions for this situation?

chapmanb commented 5 years ago

Thanks for the report and sorry about the issue and confusion. If you'd like to reanalyze from an existing BAM, the right thing to do is set an aligner:


files: your_original.bam
algorithm:
  aligner: bwa
``
`bcbio will extract the reads from the original BAM and feed into alignment and downstream steps. `realign` is different functionality (adjusting reads after initial alignment), hence the confusing error message. Hope this helps get your data processed.
Anniestasy commented 5 years ago

I've set an aligner and tried and got: ValueError: Failed to check paired status of BAM file. Is it possible to solve somehow?

And thank you for the fast response.

chapmanb commented 5 years ago

Thanks for testing and sorry about the continued issues. Would you be able to paste the full error traceback you're seeing. Specifically, the error you reported should have additional information after it detailing why it failed that might help diagnose what is going on. bcbio is trying to call a couple of samtools commands to check if the file has paired reads so it knows how to extract into fastq:

https://github.com/bcbio/bcbio-nextgen/blob/a58cbadb95f2192a25b21fbbfdb24980ce49bc53/bcbio/bam/__init__.py#L48

Thanks for the help debugging.

Anniestasy commented 5 years ago

multiprocessing: prep_align_inputs Traceback (most recent call last): File "/media/hdc/opt/install/dir/bin/bcbio_nextgen.py", line 242, in main(kwargs) File "/media/hdc/opt/install/dir/bin/bcbio_nextgen.py", line 47, in main run_main(kwargs) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 45, in run_main fc_dir, run_info_yaml) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline samples = run_parallel("prep_align_inputs", samples) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items): File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 983, in call if self.dispatch_one_batch(iterator): File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 825, in dispatch_one_batch self._dispatch(tasks) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 782, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async result = ImmediateResult(func) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 545, in init self.results = batch() File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 261, in call for func, args, kwargs in self.items] File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper return apply(f, *args, *kwargs) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 176, in prep_align_inputs return alignprep.create_inputs(args) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/alignprep.py", line 42, in create_inputs data["files"] = _prep_fastq_inputs(data["files"], data) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/alignprep.py", line 336, in _prep_fastq_inputs out = _bgzip_from_bam(in_files[0], data["dirs"], data) File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/alignprep.py", line 538, in _bgzip_from_bam if not bam.is_paired(bam_file): File "/media/hdc/opt/install/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/bam/init.py", line 63, in is_paired raise ValueError("Failed to check paired status of BAM file: %s" % str(stderr)) ValueError: Failed to check paired status of BAM file: sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018

LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6git-0156298)    LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6git-0156298)
chapmanb commented 5 years ago

Thanks much for following up with all the details. This is an issue with the latest sambamba, which exports a status line that confuses bcbio. This has been fixed in the latest development version, which we're planning to roll into a stable release in the next couple of days. If you run:

bcbio_nextgen.py upgrade -u development

and then re-run your analysis, it should hopefully proceed on to alignment. Apologies about the issues and hope this gets you going.

roryk commented 4 years ago

This should be fixed, just the issue wasn't closed. Thank you!