Illumina / Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
GNU General Public License v3.0
92 stars 16 forks source link

Pisces does not support multi-Mapped reads #34

Open chadisaad opened 5 years ago

chadisaad commented 5 years ago

When having multiple alignement for a same read. In SAM/BAM format, SEQ and QUAL of secondary alignments are set to ‘*’ to reduce the file size. And this cause problems for Pisces:

Exception reported:  
System.Exception: test.bam: Error processing chr 'chrM': Check alignment NB551082:294:H25LNBGXB:1:13309:24562:12411. Invalid cigar '32H69M': does not match length 0 of read ---> System.IO.InvalidDataException: Check alignment NB551082:294:H25LNBGXB:1:13309:24562:12411. Invalid cigar '32H69M': does not match length 0 of read
   at Pisces.Domain.Models.Read.ValidateCigar(CigarAlignment cigarData, Int32 readLength, String readName)
   at Pisces.Domain.Models.Read.UpdateFromBam()
   at Pisces.Domain.Models.Read.Reset(String chromosome, BamAlignment bamAlignment)
   at Pisces.IO.BamFileAlignmentExtractor.GetNextAlignment(Read read)
   at Pisces.Logic.Alignment.AlignmentSource.GetNextRead()
   at Pisces.Logic.SomaticVariantCaller.Execute()
   at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName)
   --- End of inner exception stack trace ---.
tamsen commented 5 years ago

Thanks for your interest in Pisces!

OK, issue noted. As soon as I have a version available that handles that case, I'll make a note in this thread.

saty89 commented 5 years ago

Hi All, on a similar note just providing more data. When I processed a bam through Hygea I see the following issue when the alignments are not primary alignments and have "*" as mentioned above -

  1. BWA mem bam (with markdups) - System.Exception: 08700029_merged.markdups.bam: Error processing chr '1': Invalid cigar '43H32M': does not match length 0 of read ---> System.IO.InvalidDataException: Invalid cigar '43H32M': does not match length 0 of read at Pisces.Domain.Models.Read.ValidateCigar(CigarAlignment cigarData, Int32 readLength) at Pisces.Domain.Models.Read.UpdateFromBam() at Pisces.Domain.Models.Read.Reset(String chromosome, BamAlignment bamAlignment) at Pisces.IO.BamFileAlignmentExtractor.GetNextAlignment(Read read) at RealignIndels.Logic.ChrRealigner.Execute() at RealignIndels.Logic.Processing.GenomeProcessor.Process(BamWorkRequest workRequest, ChrReference chrReference) at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName) --- End of inner exception stack trace ---.

Reads in that region -

NS500817:328:H5JL3BGX7:1:23308:7834:19270 435 1 10000 0 43H32M 8 67677964 0 MC:Z:73M MD:Z:32 PG:Z:MarkDuplicatesRG:Z:08700029 NM:i:0 AS:i:32

NS500817:328:H5JL3BGX7:1:23308:7834:19270 339 1 41527993 0 12H61M 8 67677933 0 MC:Z:25H50M MD:Z:6A54 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:1 AS:i:56

NS500817:328:H5JL3BGX7:1:23308:7834:19270 163 8 67677933 60 25S50M = 67677964 104
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATTAGGGCTGTATATTTGGATTCAAAATATTTGGATTTTGGCCAG AA///EEEE6//EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAE6AEEEEEEAAEEEEEEEAEEEEEE/EAEEE SA:Z:GL000227.1,73741,-,42S33M,0,0; MC:Z:73M MD:Z:50 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:50 XS:i:20

NS500817:328:H5JL3BGX7:1:23308:7834:19270 83 8 67677964 38 73M = 67677933 -104 ATATTTGGATTTTGGCCAGGCATGGTGGCTCACGCCTGTAATCCTAGCACTTTGGGAGGCCGAGGTGGGTGGA AEEAEEAE/E/EEAEEEAEEE//<EEEEE/6EEEEEEEA//E/A/EEEA6EEEEA/EEEEAEEEEEEAA/AAA MC:Z:25S50M MD:Z:73 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:73 XS:i:60

NS500817:328:H5JL3BGX7:1:23308:7834:19270 435 GL000227.1 73741 0 42H33M 8 67677964 0 AATAACCCTAACCCTAACCCTAACCCTAACCCT EEEEEEEEEEEEEEEEEEEEE//6EEEE///AA SA:Z:8,67677933,+,25S50M,60,0; MC:Z:73M MD:Z:33 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:33 XS:i:33

  1. When used BQSR (GATK) bam which removed such alignments and still has non-primary alignments with SEQ/QUAL information present Hygea was working fine.

NS500817:328:H5JL3BGX7:1:23308:7834:19270 163 8 67677933 60 25S50M = 67677964 104
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATTAGGGCTGTATATTTGGATTCAAAATATTTGGATTTTGGCCAG =C233?DDD?B?EEE@B?EEE@B?AB?EEECBE@?A=A3?EEBABC@@BBA?ABBDEABBBB3FACBD SA:Z:GL000227.1,73741,-,42S33M,0,0; MC:Z:73M MD:Z:50 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:50 XS:i:20

NS500817:328:H5JL3BGX7:1:23308:7834:19270 83 8 67677964 38 73M = 67677933 -104 ATATTTGGATTTTGGCCAGGCATGGTGGCTCACGCCTGTAATCCTAGCACTTTGGGAGGCCGAGGTGGGTGGA ?@B@BCAD0C-CCADFF@DDF0-;DCDDF-8A=CFEBC=10A1C*BCE>8BBBC@2BBCE<BABBAB@C0BA< MC:Z:25S50M MD:Z:73 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:73 XS:i:60

NS500817:328:H5JL3BGX7:1:23308:7834:19270 435 GL000227.1 73741 0 42H33M 8 67677964 0 AATAACCCTAACCCTAACCCTAACCCTAACCCT BA?B@EEE?B@EEE?B?DDD>116CC<?332C= SA:Z:8,67677933,+,25S50M,60,0; MC:Z:73M MD:Z:33 PG:Z:MarkDuplicates RG:Z:08700029 NM:i:0 AS:i:33 XS:i:33 (the last alignment was not primary alignment but was still fine).

Will be wait for the version that can handle such cases.

Thanks, Satwica

tamsen commented 5 years ago

thanks for the heads up!