Publication support for edges in answer subgraphs

karafecho commented 1 year ago

This issue is to report that publication support for edges is largely absent from ROBOKOP results. Related, if I run a multi-hop query, I cannot click on edges in the "Answer Explorer" to find additional publication support, nor can I determine the provenance of any of the edges or the nodes. There's ranking of answers, but it's not clear what those rankings are based on. This is probably a known issue(s), but I wanted to report it and suggest that the team prioritizes it.

This issue is a bit packed and may need to be dissected.

karafecho commented 1 year ago

Per Chris: The lack of publication support may be a reflection of the pubs coming from direct edges only, not direct edges plus supporting edges.

karafecho commented 1 year ago

See #105 and #106.

karafecho commented 1 year ago

Per Max: The publication support can be found in the JSON response, but it is currently not displaying in the UI.

karafecho commented 1 year ago

Additional information, excerpted from Morton, Fecho et al. 2019 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954664/), and related to the current issue regarding lack of publication support for edges:

2.3 Answer-ranking algorithm

Queries that are generated with little specification regarding nodes and edges or with multiple nodes and edges typically result in numerous matching sub-graphs. As such, the rank of sub-graphs by relevance to the query and strength of the supporting evidence is critical for user exploration of results. The ROBOKOP answer-ranking algorithm weights each edge within each sub-graph using a metric that is based on the number of PubMed abstracts that cite both the source and target nodes. The publication support is provided by an additional ROBOKOP service, termed OmniCorp, that contains a graph of PubMed identifiers linked to concepts (i.e. potential ROBOKOP KG nodes) referenced within abstracts. OmniCorp is built by processing all PubMed abstracts with the SciGraph Named Entity Recognition API (https://github.com/SciGraph/SciGraph/) and matching text in titles and abstracts to concepts from a predetermined set of biomedical ontologies. A confidence score for each answer is calculated based on the resistance distance (Klein and Randić, 1993) between leaves of the answer sub-graph, using weights derived from the publication counts provided by curated data sources and publication co-occurrence counts provided by OmniCorp, with the former treated with greater importance than the latter.

karafecho commented 1 year ago

Also note that the description above can be used to resolve #106.

karafecho commented 1 year ago

Done! I added text on the scoring-and-ranking algorithm to the About page and Quick Start Guide.

Missing publication support remains an issue.

cbizon commented 6 months ago

I'm not sure what we can do here. If the source doesn't have publications, it doesn't have publications... @eKathleenCarter maybe we should review the sources to make sure that we're getting everything that there is...

karafecho commented 6 months ago

I think this issue was due to the UI not displaying all of the pub information available in the JSON response. Since resolved. The other issue noted here was related to documentation on the scoring-and-ranking algorithm. Also resolved. As such, I'm closing the ticket ...

RobokopU24 / Feedback

Publication support for edges in answer subgraphs #99