Illumina / manta

Structural variant and indel caller for mapped sequencing data
GNU General Public License v3.0
404 stars 154 forks source link

Execute Manta on specific chromosomes #176

Closed daormar closed 5 years ago

daormar commented 5 years ago

Hi,

I'm trying to run Manta with a pair of normal/tumor bam files. The commands I execute are the following:

configManta.py --normalBam <normbamfile> --tumorBam <tumbamfile> --referenceFasta <reffile> --callRegions <file.tbi> --runDir <outdir>
<outdir>/runWorkflow.py -m local -j 1

Since I don't have the original genome reference file used to generate the bam files, I've created a reference file containing only the primary contigs and filtered the bam files accordingly using samtools. The header is also modified to remove the @SQ entries corresponding to non-primary contigs (since otherwise Manta does not work).

Under these circumstances, Manta reports a series of errors and aborts (I've attached the output to this post). The problem is related to certain reads containing the SA optional field. In particular, SA fields which refer to non-primary contigs that were removed when filtering the bam files. I've also attached the information contained in the (tumor) bam file for one of those problematic reads.

I was thinking that one possible workaround would be to systematically remove those SA entries pointing to removed contigs. However, I was wondering if there is a standard approach to solve this problem, since I think it shouldn't be uncommon to face a situation like this where the original genome reference file used to generate a set of bam files is unknown or unavailable.

Regards and thanks in advance Daniel

runWorkflow.log read_info.txt

x-chen commented 5 years ago

Hi,

Manta was designed to QC read alignment based on the reference used. If there is any inconsistent among alignment reference, SV reference and read alignment, it will trigger an error reported.

Have you considered regenerating the alignment from the existing bam with an available reference?

daormar commented 5 years ago

Hi,

No, I haven't considered the option you mention. Since I need to process samples from whole data sets, doing this will be computationally expensive, however, probably it's the way to go, thanks!

Daniel