bigdatagenomics / adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Apache License 2.0
1k stars 308 forks source link

PhredUtils conversion to log probabilities has insufficient resolution for PLs #1569

Closed fnothaft closed 7 years ago

fnothaft commented 7 years ago

Genotype Phred Likelihoods aren't restricted to <256, as Phred scores are typically restricted for reads. As such, we clip when dealing with large PLs, e.g.,:

chr22   18030096    .   TAAA    T,TA,TAA,<NON_REF>  564.73  .   BaseQRankSum=-0.133;ClippingRankSum=-1.438;DP=114;MLEAC=0,1,1,0;MLEAF=0.00,0.500,0.500,0.00;MQ=69.72;MQ0=0;MQRankSum=-0.686;ReadPosRankSum=-0.013   GT:AD:DP:GQ:PL  2/3:13,3,17,17,0:50:86:602,508,1628,86,678,553,137,342,0,281,467,744,353,309,659
fnothaft commented 7 years ago

See further discussion of accuracy for converting log (p - 1) to log p here.