NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Source Links #592

Closed sstemann closed 8 months ago

sstemann commented 9 months ago

Questions about source links (this query has some examples: https://ui.test.transltr.io/main/results?l=Aicardi%20Syndrome&i=MONDO:0010568&t=0&q=d1d73626-7167-40e4-b38b-5f9cfef698eb

  1. Are Monarch Initiative and SRI Reference Knowledge Graph API the same source? image

  2. Why is there no wiki pages for: ARAX, DrugBank, Orphanet Rare Disease Ontology, Drug Repurposing Hub image

https://ui.test.transltr.io/main/results?l=Psoriasis&i=MONDO:0005083&t=0&q=c58a990c-1536-4c98-b454-20f260625407 image

saramsey commented 9 months ago

To fix the ARAX infores URL, I have made a PR to the Biolink project area: https://github.com/biolink/biolink-model/pull/1403

karafecho commented 9 months ago

In response to your questions, @sstemann:

  1. Yes, I believe that Monarch Initiative and SRI Reference Knowledge Graph API are identical. My investigative work suggests that Monarch Initiative may have been an earlier version of the SRI Reference Knowledge Graph API, one that was incorrectly named, given that Monarch Initiative isn't even a knowledge source. This is a known issue that we are working to resolve.

  2. There are multiple issues here.

a. DrugBank and Drug Repurposing Hub - wiki pages exists, but the URLs in the infores catalog will need to be updated

 - id: infores:drugbank
    status: released
    name: DrugBank
    xref:
      - http://www.drugbank.ca/
    synonym:
      - Drugbank
    knowledge level: curated
    agent type: not_provided
    description: >-
      A comprehensive, free-to-access, online database containing information
      on drugs and drug targets. As both a bioinformatics and a cheminformatics resource,
      we combine detailed drug (i.e. chemical, pharmacological and pharmaceutical) data
      with comprehensive drug target (i.e. sequence, structure, and pathway) information
  - id: infores:drug-repurposing-hub
    status: released
    name: Drug Repurposing Hub
    xref:
      - https://clue.io/repurposing
    knowledge level: curated
    agent type: not_provided
    description: >-
      Curated and annotated collection of FDA-approved drugs, clinical
      trial drugs, and pre-clinical tool compounds with a companion information resource

b. Orphanet Rare Disease Ontology - a wiki page will need to be created and the URL in the infores catalog will need to be updated

  - id: infores:ordo
    status: released
    name: Orphanet Rare Disease Ontology
    xref:
      - https://bioportal.bioontology.org/ontologies/ORDO
    synonym:
      - ORDO
    knowledge level: curated
    agent type: not_provided
  - id: infores:orphanet
    status: released
    name: Orphanet
    xref:
      - https://www.orpha.net
    knowledge level: curated
    agent type: not_provided

c. Expander Agent (ARAX) has a wiki page, but ARAX should not be surfacing as a primary knowledge source. Same with ARAGORN, Service Provider, and Unsecret. If ARAs decide to surface their reasoning agent as a primary knowledge source, then they should be pointing to their agent, not their team, as the source, and they should not be naming the source, e.g., "Unsecret Agent OpenAPI for NCATS Biomedical Data Translator Reasoners". This is also a known issue. All four of the above ARAs are aware of the issue.

Just so you are aware, the infores / wiki effort is an ongoing one, with very few persons contributing. Moreover, issues have surfaced and/or changes have been made that require attention / action by other teams, which we cannot control.

For the initial public release, we focused on those primary knowledge sources that were being returned by the ARS in response to select MVP1 (GARD) and MVP2 queries. Moving forward, the plan is described in tab two C10 here.

Hope this helps ...

kevinschaper commented 8 months ago

I might need some help from EvanDietzMorris to answer about the difference between infores:sri-reference-kg, infores:automat-renci-sri-reference-kg and infores:monarchinitiative and what it takes for one of those to show up in the UI.

I don't know if it's practical, but it might be nice to phase out the name SRI Reference Graph, which I think historically was a biolink model conversion of the Dipper generated Monarch Initiative graph (and therefore had to be differentiated), but as we've rebuilt our graph pipeline, the monarch graph and the SRI reference graph are the same thing.

The two things that I'm not sure about:

Is the primary_knowledge_source preserved from the monarch-kg KGX files as it passes through ORION? Or is it replaced with infores:monarchinitiative?

Is there a (still?) a KP that's built using api.monarchinitiative.org? This endpoint is still up, but hasn't been updated for 2 years and has a limited lifetime that it will stay up. (The data files that backed it will still be hosted and available, of course)

karafecho commented 8 months ago

@kevinschaper @EvanDietzMorris : You may find this G-sheet helpful, as it contains recent (late October) ARS/UI results from select MVP1/MVP2 queries.

Note that I think that infores:automat-renci-sri-reference-kg has been deprecated, but infores:sri-reference-kg and infores:monarchinitiative remain active. There's also infores:sri-ontology.

If you all decide to phase out infores:sri-reference-kg in favor of infores:monarchinitiative, I'd suggest that you rename that latter infores:monarch-kg or infores:monarch-initiative-kg.

EvanDietzMorris commented 8 months ago

Copying my response to the other issue about this:

This is due to the (admittedly complicated) fact that we have a version of the monarch sri-reference-kg that we ingest for robokop, which is a subset of the real sri-reference-kg (but made from an older version of the graph that did not include the actual primary knowledge sources, so it gets infors:sri-reference-kg assigned as the primary source). So - it's a bit of a mess currently we have two redundant infores ids in the catalog for this content (infores:monarchinitiative and infores:sri-reference-kg) but neither should be returned as a primary knowledge source, it should always be an aggregator, assuming every edge in the content has it's own primary knowledge source. We'll need to rebuild our version of this graph, and to pick which infores we want to use as the proper aggregator knowledge source for this content.

And adding here: The graph I get from Kevin should currently end up with the following EPC for an edge: primary knowledge source: whatever is on the edge aggregator knowledge source 1: infores:monarchinitivate aggregator knowledge source 2: infores:automat-sri-reference-kg

Sounds like we need to:

  1. Decide, what should the aggregator knowledge source refering to edges coming from this graph actually be? infores:monarchinitiative?
  2. We at renci need to update our verison of the graph and fix the provenance on it to use the primary source from the edges and the correct aggregator.
  3. Figure out what the automat infores ids should be for these two separate instances should be, it looks like the one currently used for our version (poorly named for historical reasons) infores:automat-biolink was removed from the catalog
karafecho commented 8 months ago

Thanks for the clarification, Evan.

Those of us who are working on the infores / wiki reconciliation and update effort (me, Andy, Sierra, Matt, Carrie) would appreciate it if you did not refer to infores:monarchinitiative as an aggregator knowledge source. The reason is that monarchinitiative is a group/org, not a KG. If possible, and should you choose to consider infores:monarchinitiative as the aggregator knowledge source, then perhaps change the infores id to infores:monarchinitiative-kg or infores:monarch-initiative-kg.

EvanDietzMorris commented 8 months ago

I don’t have any strong opinions here. Sounds like something like this would work? Both the monarch initiative team and renci need to rebuild graphs before this could be deployed though.

infores:monarch-initiative-kg (as the first aggregator for everything coming from this kg) infores:automat-monarch-initiative-kg (as a second aggregator for the whole graph hosted on automat) infores:automat-renci-monarch-initiative-kg (as a second aggregator for the renci version)

kevinschaper commented 8 months ago

Oh! I'm putting infores:monarchinitiative as an aggregator knowledge source on nearly everything in monarch-kg. ( not the subset that comes from phenio, but that feels like a bug I should fix.)

I like the idea of infores:monarch-kg (+ infores:automat-monarch-kg & infores:automat-renci-monarch-kg), just omitting the -initiative part, because we haven't ever referred to the graph that way before.

fwiw, it seems like the KG should come with infores:monarch-kg populated for all edges, and then the automat & automat-renci pipelines should only add their own names to the aggregator list.

Also, are the aggregator values already present in the KG preserved? For example, we get OMIM & Orphanet via HPOA files, so we use infores:omim & infores:orphanet as primary, and add infores:hpo-annotations as an aggregator (along with infores:monarchinitiative - which I can switch to infores:monarch-kg)

EvanDietzMorris commented 8 months ago

fwiw, it seems like the KG should come with infores:monarch-kg populated for all edges, and then the automat & automat-renci pipelines should only add their own names to the aggregator list.

This is how it works now, with the exception that if an edge is missing a primary knowledge source the automat infores will get assigned as one (which should really never be the case).

Also, are the aggregator values already present in the KG preserved?

This is currently a limitation of plater - because we don't have any examples of multiple aggregators chained before an edge gets into one of our knowledge graphs, we don't have a way to represent that which plater understands at the moment. We were planning to implement it very soon though. Currently only one aggregator can be specified per edge as "biolink:aggregator_knowledge_source". I'm curious how you have multiple ones represented now. Let's discuss on Slack - if we need to rebuild the graphs anyway, we can go ahead and implement a solution that will support chaining.

EvanDietzMorris commented 8 months ago

To clarify some after talking with Kevin I realized I misspoke some - we can support multiple aggregators in the biolink:aggregator_knowledge_source field, but it assumes that they parallel and not chained together. Either way I think we know what needs to be done here:

EvanDietzMorris commented 8 months ago

After some discussion we have decided to remove the RENCI version of the graph from automat completely (though we will still be including edges from it within the robokop kg). This will eliminate complexity and confusion stemming from having two versions etc.

So now we just need to change the infores ids to infores:monarch-kg and infores:automat-monarch-kg and rebuild the monarch kg with edges with the new infores. I'll remove "biolink" our version of this from plater and any infores ids associated with that.

sierra-moxon commented 8 months ago

I would vote to leave the infores id as infores:monarchinitiative vs changing it to infores:monarch-kg. Changing this id would necessitate me trying to update several other infores sources currently defined at the "organization" level vs. at the "kg" level, as well as handling several deprecations of identifiers (and managing the dissemination of the deprecation through the rest of the KPs that might use those deprecated identifiers), and making sure we define "kg" at a granular enough level so that everyone is on the same page about what "-kg" means.

What if we add the "-kg" version to the synonym field? Would that be a good compromise?

This is the current infores:monarchinitiative stanza, does the wiki URL need to change? :

  - id: infores:monarchinitiative
    status: released
    name: Monarch Initiative
    xref:
      - https://github.com/NCATSTranslator/Translator-All/wiki/SRI-Reference-Knowledge-Graph
    knowledge level: curated
    agent type: not_provided
karafecho commented 8 months ago

@sierra-moxon : Based on your post here and various Slack exchanges, I think you are fine with Evan moving forward with deprecation of the duplicate RENCI Monarch graph, but you have concerns about changing the infores id from infores:monarchinitiative to infores:monarch-kg, in part because you are not comfortable with the tag -kg. Is that right? If so, then it sounds like Evan can move forward with deprecation of the duplicate RENCI Monarch graph but maybe leave the infores id as infores:monarchinitiative for the time being, until a broader discussion takes place.

Regardless, the URL will need to be changed/retitled, but I can create a PR for that as part of the infores / wiki effort.

sierra-moxon commented 8 months ago

Thanks @karafecho - that would be great, and it does sum up my comment nicely. :)

EvanDietzMorris commented 8 months ago

We have already removed our version of the graph from automat, but edges with “sri-reference-kg” will still be coming from robokop until we build robokop again. For the future those edges will just have whatever comes directly from the monarch kg.

karafecho commented 8 months ago

FYI: I created a PR to change the URLs for DrugBank, Drug Repurposing Hub, Orphanet Rare Disease Ontology (I created a new wiki page), and Monarch Intitiative (retitled from SRI Reference KG) and also deprecate infores:automat-renci-sri-kg.

Update: Sierra merged the PR.

dnsmith124 commented 8 months ago

Per discussion on the TAQA call on 11/17:

I've assigned myself to this ticket to keep track of this deployment, and will update once it's complete.

karafecho commented 8 months ago

@EvanDietzMorris : A few of us, including Sarah and Sierra, met to discuss this ticket as part of TAQA. We decided to keep infores:monarchinitiative and NOT change it to infores:monarchinitiative-kg.

As such, I am closing this ticket.