exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
194 stars 54 forks source link

PhenIX similarity score vs p-value #170

Open visze opened 7 years ago

visze commented 7 years ago

I managed to run a vcf through exomiser-web using phenix. But there are strange results (phenotype score):

First variant: Phenotype score 1.0 and PhenIX semantic similarity score: 1.84 (p-value: 0.695560) Second variant: Phenotype score 1.0 and PhenIX semantic similarity score: 1.84 (p-value: 0.319460)

So exomiser score and Phenix score is always the same but p-value differs (it continues like that).

I think that the representation PhenIX semantic similarity score: 1.84 (p-value: 0.319460) shows that the p-value of 1.84 is 0.319460. But why is the p-value different in other variants but they have the exact same PhenIX semantic similarity score?

cc @drseb

drseb commented 7 years ago

This isn't the problem @visze . It is correct that same scores can have different p-values. The problem is rather that the normalisation should be done over the p-value and not over the raw score. I.e. all variants with the smallest p-value should have Phenotype-Score 1.0

visze commented 7 years ago

this is not what I meant. Right now it is:

same score = different p-value

this is strange...

drseb commented 7 years ago

Sorry, have to rephrase: It is correct that same scores can have different p-values. (I updated comment above)

pnrobinson commented 7 years ago

Consider that the possible range of scores for any disease depends on the annotations it has -- therefore, the "best" p-value depends on the individual annotation structure of each disease.

visze commented 7 years ago

yep. this is what @drseb told me. But maybe we can also think of the representation. Using two columns:

score
p-value

instead of score (p-value) I think we have to ask the users here.

drseb commented 7 years ago

Ok. We have two things in this ticket. First issue is the representation of the results, i.e. put the p-value in brackets behind scores. We should ask some users if they find this confusing. (I think it is ok)

Second issue is the way the score is normalized as I tried to describe above.