cancerit / cgpPindel

Cancer Genome Project Insertion/Deletion detection pipeline based around Pindel
http://cancerit.github.io/cgpPindel/
GNU Affero General Public License v3.0
28 stars 5 forks source link

Differing number of variant read counts for v3.3.0 and v3.5.0 and above #111

Open sb43 opened 7 months ago

sb43 commented 7 months ago

I have tested different versions of pindel to narrow down the read counting issue, it seems the issue has been observed after version v3.3.0 Versions tested and issue reproduced. pindel_3.10.0 (current version) pindel_3.5.0 pindel_3.7.0

VCF record showing difference in DP and PD counts in both tumour and normal samples between pindel versions.

Correct call pindel version (V3.3.0)

tabix ../WTSI-COLO_170_1pre_pindel_old_version_3.3.0/WTSI-COLO_170_1pre_vs_WTSI-COLO_170_b.flagged.vcf.gz chr2:202555406-202555406 chr2 202555406 897c3bb4-f119-11ee-90f0-9928a4fd4733 GA G 1200 PASS FF017;LEN=1;PC=D;RE=202555414;REP=7;RS=202555406;S1=105;S2=1683.32 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:FD:FC:MTR:WTR:AMB:UNK:VAF ./.:0:0:0:0:15:19:15:19:0:0:34:0:0:32:0:0:0.0000 ./.:14:6:15:8:35:27:35:27:15:8:62:23:25:38:5:0:0.3968

Output from all affected versions

tabix pindel_3.10.0/WTSI-COLO_170_1pre_vs_WTSI-COLO_170_b.flagged.vcf.gz chr2:202555406-202555406 chr2 202555406 97380cb8-f75b-11ee-b828-bb7ec1a1a3fd GA G 1200 FF018 PC=D;RS=202555406;RE=202555414;LEN=1;S1=105;S2=1683.32;REP=7;FF017 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:FD:FC ./.:0:0:0:0:2:0:2:0:0:0:2:0 ./.:14:6:1:0:2:1:16:7:15:6:23:21

VCF_HEADER:

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FILTER= 10">

Screenshot 2024-04-26 at 15 31 43 Thanks, Shriram

keiranmraine commented 4 months ago

Following our discussion I created a Dockerfile to install the dependencies in a single image and very small script focusing on the Bio::DB::HTS call to get_features_by_location that we identified as returning incorrect values:

https://github.com/cancerit/cgpPindel/blob/7814e379452206d10c67021630b1a028ee30563e/perl/lib/Sanger/CGP/Pindel/OutputGen/CombinedRecordGenerator.pm#L237-L246

Investigation

Reproduce issue in cgppindel 3.10.0 image:

# quay.io/wtsicgp/cgppindel:3.10.0
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data quay.io/wtsicgp/cgppindel:3.10.0 perl /var/spool/data/htslib-bind.pl | wc -l
2

Command runs docker, mounting a folder containing the script and bam/bai files, inline executes the script and counts the reads.

$ tree tmp/
tmp/
├── htslib-bind.pl
├── normal.sample.dupmarked.bam
└── normal.sample.dupmarked.bam.bai

Reproduce and prove fix with minimal image for speed of rebuild:

# reproduced: hts-lib-1.12
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data hts-lib-1.12 perl /var/spool/data/htslib-bind.pl | wc -l
2
# fixed: hts-lib-1.20
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data hts-lib-1.20 perl /var/spool/data/htslib-bind.pl | wc -l
34

Remediation actions

You need to update all of you image stack to 1.20 to ensure this isn't impacting any other tooling. For pindel specifically this is:

You should review the other code stacks as most are built on top of PCAP-core.

htslib-bind.pl

Expects files to be mounted to /var/spool/data/normal.sample.dupmarked.bam (inc index).

#!perl

use strict;
use Const::Fast qw(const);
use Bio::DB::HTS;

const my $F_UNMAPPED    => 4;
const my $F_NOT_PRIMARY => 256;
const my $F_QC_FAIL     => 512;
const my $F_DUPLICATE   => 1024;
const my $F_SUPP_ALIGN  => 2048;
const my @FILTER_READS_IF_FLAG => ($F_DUPLICATE, $F_UNMAPPED, $F_SUPP_ALIGN, $F_QC_FAIL, $F_NOT_PRIMARY);

my $sam = Bio::DB::HTS->new(-bam => '/var/spool/data/normal.sample.dupmarked.bam');
my $seqid = 'chr2';
my $r_start = 202555406;
my $r_end = 202555406;

my $align_iter = $sam->get_features_by_location(
    -seq_id => $seqid,
    -start => $r_start,
    -end => $r_end,
    -filter => \&_read_filter,
    -iterator => 1,
);

while(my $a = $align_iter->next_seq) {
    print $a->qname."\n";
}

sub _read_filter {
  my $flag = shift->flag;
  for my $test(@FILTER_READS_IF_FLAG) {
    return 0 if(($flag | $test) == $flag);
  }
  return 1;
}
tomdrever commented 1 month ago

This issue has been resolved in cgpPindel 3.11.0