TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

Missing Information Content #182

Open EvanDietzMorris opened 1 year ago

EvanDietzMorris commented 1 year ago

I used to get an information content property when normalizing HGNC:7432 but it seems to have disappeared.

gaurav commented 7 months ago

There are two issues here:

  1. HGNC:7432 is cliqued with NCBIGene:4522, which has an info-content value of 100. But this isn't being associated with the clique correctly. It looks like NCBIGene identifiers aren't getting associated with info content values because we use a pretty simple algorithm to try to map from CURIEs to the URLs that Ubergraph uses. This works for PURL URLs, but not for NCBIGene URLs, which are in the form https://identifiers.org/ncbigene/4522.
  2. HGNC:7432 is conflated with PR:P11586 under gene-protein conflation, which has an info-content value of 92.7. You can get this value by looking up the protein identifier (with gene-protein conflation turned on or off), but not by looking up the gene identifier. I'm not sure where/how conflation intersects with information content values, but it looks like we're not looking up information content values correctly for conflated values.
gaurav commented 6 months ago

Note also that Biolink Model and UberGraph disagree on where NCBIGene should have an HTTP or HTTPS URL, see https://github.com/biolink/biolink-model/issues/1431