cancerit / cgpCaVEManPostProcessing

Flagging add on to CaVEMan
http://cancerit.github.io/cgpCaVEManPostProcessing/
GNU Affero General Public License v3.0
2 stars 4 forks source link

Fails if all reads have common call at position #35

Closed keiranmraine closed 4 years ago

keiranmraine commented 4 years ago

PR and test example to follow, but here to allow head start on confirming I've understood this correctly @davidrajones

The following block has a problem:

https://github.com/cancerit/cgpCaVEManPostProcessing/blob/c520edd24a75a7d0c530e54129d52be43e2dfc0e/lib/Sanger/CGP/CavemanPostProcessor.pm#L308-L320

In the first block, where it tests for overlapping reads, it only records a result if both reads disagree:

https://github.com/cancerit/cgpCaVEManPostProcessing/blob/c520edd24a75a7d0c530e54129d52be43e2dfc0e/lib/Sanger/CGP/CavemanPostProcessor.pm#L309-L310

But it assigns the count only to the +ve strand:

https://github.com/cancerit/cgpCaVEManPostProcessing/blob/c520edd24a75a7d0c530e54129d52be43e2dfc0e/lib/Sanger/CGP/CavemanPostProcessor.pm#L311-L312

No counts are incremented if the reads agree which results in the process failing.

I think this should be as follows as the subsequent section handles the balancing of +/- strand:

foreach my $readnom( @$readname_arr){
    if(exists $hashed_reads->{$readnom}->{1} && exists $hashed_reads->{$readnom}->{-1}){
        $loc_counts->{1}->{qbase}++;
        $loc_counts->{-1}->{qbase}++;
    }elsif(exists $hashed_reads->{$readnom}->{1}){ # + strand populated only
        $loc_counts->{1}->{qbase}++;

    }else{ # - strand populated only
        $loc_counts->{-1}->{qbase}++;
    }
} # End of iteration through each readname for an initial count

Example data that triggered the problem:

$VAR1 = {
          'HS16_6996:1:1102:12569:60118#2' => {
                                                '-1' => {
                                                          'ln' => 75,
                                                          'rdPos' => 13,
                                                          'primaryalnscore' => 70,
                                                          'rdName' => 'HS16_6996:1:1102:12569:60118#2',
                                                          'softclipcount' => 0,
                                                          'qbase' => 'C',
                                                          'matchesindel' => '',
                                                          'xt' => undef,
                                                          'qual' => 60,
                                                          'qscore' => 9,
                                                          'str' => -1,
                                                          'start' => 146972148
                                                        },
                                                '1' => {
                                                         'str' => 1,
                                                         'start' => 146972110,
                                                         'qscore' => 42,
                                                         'qual' => 60,
                                                         'xt' => undef,
                                                         'qbase' => 'C',
                                                         'matchesindel' => '',
                                                         'softclipcount' => 0,
                                                         'rdName' => 'HS16_6996:1:1102:12569:60118#2',
                                                         'primaryalnscore' => 70,
                                                         'rdPos' => 51,
                                                         'ln' => 75
                                                       }
                                              },
          'HS12_6924:5:1206:17230:96528#2' => {
                                                '-1' => {
                                                          'rdName' => 'HS12_6924:5:1206:17230:96528#2',
                                                          'primaryalnscore' => 65,
                                                          'rdPos' => 16,
                                                          'ln' => 75,
                                                          'start' => 146972145,
                                                          'str' => -1,
                                                          'qual' => 60,
                                                          'xt' => undef,
                                                          'qscore' => 39,
                                                          'matchesindel' => '',
                                                          'qbase' => 'C',
                                                          'softclipcount' => 0
                                                        },
                                                '1' => {
                                                         'softclipcount' => 0,
                                                         'qbase' => 'C',
                                                         'matchesindel' => '',
                                                         'qual' => 40,
                                                         'xt' => undef,
                                                         'qscore' => 4,
                                                         'start' => 146972086,
                                                         'str' => 1,
                                                         'ln' => 75,
                                                         'rdPos' => 75,
                                                         'primaryalnscore' => 72,
                                                         'rdName' => 'HS12_6924:5:1206:17230:96528#2'
                                                       }
                                              }
        };
keiranmraine commented 4 years ago

Okay, so the read selection code needs to test the qscore when selecting the read to use. In both cases above the score is good in one read and poor in the other. Selecting the read with the highest qscore will fix the edge case.

keiranmraine commented 4 years ago

To be closed with #36

keiranmraine commented 4 years ago

Will be released once #37 changes are approved.