TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities
https://thejacksonlaboratory.github.io/LIRICAL/stable
Other
22 stars 11 forks source link

Possible bug in genotype LR? #517

Closed pnrobinson closed 4 years ago

pnrobinson commented 4 years ago

The performance is uncanged for original etc, but I just tried an example with one pathogenic allele for an autosomal recessive disease. Previously, this would give a decent score Now, we are getting -1.

PIGV: One pathogenic allele detected with autosomal recessive disease. Observed weighted pathogenic variant count: 1.00. λdisease=2. λbackground=0.0110. log10(LR)=-1.000.

In R, with these values, we get a value of about +1

x<-dpois(1,2)/dpois(1,0.1)
> x
[1] 2.991372
> log(x)
[1] 1.095732

I think that some code for some heuristic is wrong/was buggified. Need to test prior to v1 release!

kingmanzhang commented 4 years ago

I saw this in my email. I think you want to double check R doc about dpois. I could not understand your computation in R. Since LR = p(G|D) / p(G|background), I think you want to do (?): x = (ppois(1,2) - ppois(0, 2)) / (ppois(1,0.0110) - ppois(0, 0.0110))

x [1] 24.87858 log10(x) [1] 1.395826

On Wed, Nov 27, 2019 at 5:48 PM Peter Robinson notifications@github.com wrote:

The performance is uncanged for original etc, but I just tried an example with one pathogenic allele for an autosomal recessive disease. Previously, this would give a decent score Now, we are getting -1.

PIGV: One pathogenic allele detected with autosomal recessive disease. Observed weighted pathogenic variant count: 1.00. λdisease=2. λbackground=0.0110. log10(LR)=-1.000.

In R, with these values, we get a value of about +1

x<-dpois(1,2)/dpois(1,0.1)

x

[1] 2.991372

log(x)

[1] 1.095732

I think that some code for some heuristic is wrong/was buggified. Need to test prior to v1 release!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TheJacksonLaboratory/LIRICAL/issues/517?email_source=notifications&email_token=AFL4YC2JLK5JDOPX5J75HU3QV32K3A5CNFSM4JSNBQA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H4RELXA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFL4YC6VMJRTIP6P3HJ3IVTQV32K3ANCNFSM4JSNBQAQ .

--

pnrobinson commented 4 years ago

I am not sure your derivation is correct, and in either case there is a new bug since with either interpretation the LR should be about ten or more to one. I think I added this bug while correcting some other one. Happy Thanksgiving!

pnrobinson commented 4 years ago

I see that I introduced a heuristic for AR diseases where only one pathogenic allele is found. This heuristic reduces performance on the obfuscated AR set from about 58% ion first place to only about 40% in first place. I think that I introduced it after seeing that it also added some false positives to the original analysis. TODO -- redo the analysis and check overall performance with and without this heuristic.

pnrobinson commented 4 years ago

THis can be commented out to remove the heuristic (not checked in)

//            if (strict && inheritanceId.equals(AUTOSOMAL_RECESSIVE) && g2g.pathogenicAlleleCount() < 2) {
//                max = updateMax(HEURISTIC_ONE_ALLELE_FOR_AR_DISEASE, max);
//                maxInheritanceMode = inheritanceId;
//                heuristicOneAlleleAR = true;
//                heuristicPathCountAboveLambda = false;
//            } else
pnrobinson commented 4 years ago

This heuristic slightly improved the rankings at place 1 in the original data but substantially worsened the performance if one allele is missing. The following are the results if after I removed the heuristic. Everything is a tradeoff, but I think this is the better way and probably is more useful with real (noisey) data.

1: 286 (74.5%) 2: 47 (12.2%) 3: 19 (4.9%) 4: 10 (2.6%) 5: 3 (0.8%) 6: 2 (0.5%) 8: 2 (0.5%) 9: 1 (0.3%) 10: 1 (0.3%) 13: 1 (0.3%) 14: 3 (0.8%) 15: 1 (0.3%) 19: 1 (0.3%) 23: 1 (0.3%) 118: 4 (1.0%) 120: 2 (0.5%) 1: 308 (80.2%) [by gene] 2: 34 (8.9%) [by gene] 3: 15 (3.9%) [by gene] 4: 9 (2.3%) [by gene] 5: 1 (0.3%) [by gene] 6: 2 (0.5%) [by gene] 7: 1 (0.3%) [by gene] 8: 1 (0.3%) [by gene] 9: 1 (0.3%) [by gene] 10: 1 (0.3%) [by gene] 14: 3 (0.8%) [by gene] 15: 1 (0.3%) [by gene] 17: 1 (0.3%) [by gene] 19: 1 (0.3%) [by gene] 66: 1 (0.3%) [by gene] 118: 4 (1.0%) [by gene]

pnrobinson commented 4 years ago

I have remove the heuristic and updated the manuscript. To close this issue, we still need to clean up the code. For instance, heuristicOneAlleleAR is a boolean var that is not needed anymore

pnrobinson commented 4 years ago

done