Phenomics / ontolib

A modern Java library for working with (biological) ontologies.
https://ontolib.readthedocs.org
Other
9 stars 2 forks source link

Add ability to update score distributions #8

Closed julesjacobsen closed 7 years ago

julesjacobsen commented 7 years ago

Phenix currently uses some ancient score distributions generated against an HPO from back in the old days. We really need these to be updatable by the user, similar to how Jannovar allows updating of the ser files.

Ideally this mechanism should be both command-line driven and also programmatically accessible as a library for other code to call it too, such as in an Exomiser build.

holtgrewe commented 7 years ago

Fixed in #19.

One question remains: how precisely do we want to down-sample the empirical distributions for sematic similarity scores. The non-resampled data takes ~1.4GB, when resampling to 200 points, this goes down to 220MB. The question also is how to downsample. The current code in H2ScoreDistributionWriter.java samples to every (target points)/(old count) points. @pnrobinson @drseb

drseb commented 7 years ago

Previously, I rounded to 4 digits after the comma.

julesjacobsen commented 7 years ago

Thanks for doing this - I've added a comment on the PR too.