IARCbioinfo / needlestack

Multi-sample somatic variant caller
GNU General Public License v3.0
49 stars 15 forks source link

correct insertion extraction from mpileup #175

Open tdelhomme opened 6 years ago

tdelhomme commented 6 years ago

At the moment, samtools mpileup gives an "=" at the end of some insertions, and we have opened an issue for that.

This results in such a line in the VCF, were ALT contains (or is) an equal:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  IonXpress_005   IonXpress_019   IonXpress_020   IonXpress_030   IonXpress_053   IonXpress_104   IonXpress_105
chr11   62397115    .   -   =   152.793 .   TYPE=ins;NS=103;AF=0.00970874;DP=43980;RO=37749;AO=10;SRF=17260;SRR=20489;SAF=5;SAR=5;SOR=0.536285;RVSB=0.542758;FS=0;ERR=5.01679e-05;SIG=0.001;CONT=ATGxCCC;AC=0;AN=14 GT:QVAL:DP:RO:AO:AF:SB:SOR:RVSB:FS:QVAL_minAF:STATUS    0/0:0:202:177:0:0:112,65,0,0:-1:-1:0:1217.54:.  0/0:0:298:248:0:0:120,128,0,0:-1:-1:0:1850.99:. 0/0:0:306:244:0:0:99,145,0,0:-1:-1:0:1879.87:.  0/0:0:469:414:0:0:181,233,0,0:-1:-1:0:2952.58:. 0/0:0:401:349:0:0:138,211,0,0:-1:-1:0:2512.24:. 0/0:0:672:557:0:0:229,328,0,0:-1:-1:0:inf:. 0/0:0:713:623:0:0:322,301,0,0:-1:-1:0:inf:.

Either it is a bug from samtools and we just need to wait for their bug resolving or we need adapt the mpileup2readcounts script.

tdelhomme commented 6 years ago

This is actually an htslib 8yo bug reported here, for the moment we should adapt our mpileup2readcounts until they correct this.