NCATS-Gamma / robokop

Master UI for ROBOKOP
MIT License
16 stars 3 forks source link

ranking ignoring support edges? #477

Closed cbizon closed 4 years ago

cbizon commented 4 years ago

https://robokop.renci.org/a/097cb001-7caa-4d21-9f46-9ddfaff28e6b_03d17ce3-0bf0-433f-ba50-d482726e66a2/

The query is (SVIL gene)-(variant)-(phenotpye) (autism). So there's no edge in the query from autism to other things. But some phenotypes will have a support edge going to autism.

For instance, 'intelligence' shares 1100 publications with autism, but it gets the same score as emphesyma pattern measurement, which shares none. And both get a lower score than some other things that also share none (but have an extra 'real' edge between the variant and gene).

But this 1100 publications should count for a lot? Or at least some?

cbizon commented 4 years ago

Here's another example

https://robokop.renci.org/a/1f7c30db-6430-4b9b-92bb-346e2fdb7dfc_b02549c5-d100-42f7-84a9-0b1d37b17d08/

This is just brain -has_part-> cell

There are lots of counts, and it's possible that because of the normalization scores wouldn't be monotonic to the number of publications between the brain and the cell. But the scores are all the same, which seems far less likely.

patrickkwang commented 4 years ago

The brain->cell example seems to have been fixed by some recent update: https://robokop.renci.org/a/1f7c30db-6430-4b9b-92bb-346e2fdb7dfc_0b09fe84-b1f1-4890-86da-45e2a3f2397f/

I'm still looking into the autism example.

patrickkwang commented 4 years ago

This results from a bug in scoring with disconnected nodes, like autism in the above example. In https://github.com/NCATS-Gamma/robokop-messenger/commit/b759b28b346df4765d14d4898e48b1ede689853f this is corrected, and these answers are given a score of 0. This is perhaps undesirable, and is addressed by #481.