RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
37 stars 8 forks source link

devise a way to store GO-type evidence codes for edges in KG2 #40

Open saramsey opened 4 years ago

saramsey commented 4 years ago

requested by UAB/PMI team

ecwood commented 4 years ago

@saramsey Is there currently a spot in KG2 to store ECO codes/evidence codes/confidence scores/etc? I'm assuming it would go in the publications info dictionary, but I don't know what the key would be? (GO -- RTXteam/RTX#838 -- has ECO codes and evidence codes; CTD -- RTXteam/RTX-KG2#39 -- has confidence scores)

saramsey commented 4 years ago

@saramsey Is there currently a spot in KG2 to store ECO codes/evidence codes/confidence scores/etc? I'm assuming it would go in the publications info dictionary, but I don't know what the key would be? (GO -- RTXteam/RTX#838 -- has ECO codes and evidence codes; CTD -- RTXteam/RTX-KG2#39 -- has confidence scores)

I am not sure; I will ask the Data Modeling Group

saramsey commented 4 years ago

OK I have posted about this on the Translator slack, in the #datamodeling channel:

Screen Shot 2020-08-22 at 9 36 27 PM
saramsey commented 4 years ago

Here is Deepak's reply:

Screen Shot 2020-08-24 at 10 17 06 AM

apparently we should use the evidence attribute

saramsey commented 4 years ago

In KG2, let's store the GO evidence code in an edge slot evidence. I think we should store the GO evidence code as as a CURIE ID, like this: GO.EC:IDA. In curies-to-urls-map.yaml, we can map the CURIE prefix GO.EC to the base URL http://www-legacy.geneontology.org/GO.evidence.shtml#

ecwood commented 4 years ago

In KG2, let's store the GO evidence code in an edge slot evidence. I think we should store the GO evidence code as as a CURIE ID, like this: GO.EC:IDA. In curies-to-urls-map.yaml, we can map the CURIE prefix GO.EC to the base URL http://www-legacy.geneontology.org/GO.evidence.shtml#

Hi @saramsey, will both GO evidence codes and ECO evidence codes go in this field? If so, should it be a list? Proposed change to kg2_util:

def make_edge(subject_id: str,
              object_id: str,
              relation_curie: str,
              predicate_label: str,
              provided_by: str,
              update_date: str = None):

    return {'subject': subject_id,
            'object': object_id,
            'edge_label': predicate_label,
            'relation': relation_curie,
            'negated': False,
            'publications': [],
            'publications_info': {},
            'update_date': update_date,
            'provided_by': provided_by,
            'evidence': []}

evidence can be manipulated within each ETL script as necessary and added to an edge via edge['evidence'] = some_list.

saramsey commented 4 years ago

Yes, let's go with a list of CURIE IDs. Thank you.

ecwood commented 3 years ago

Hi @saramsey, I saw this graphic in the All Things Provenance breakout group: image

Does that mean the entry should be has_evidence rather than evidence?

saramsey commented 3 years ago

Hi @saramsey, I saw this graphic in the All Things Provenance breakout group: image

Does that mean the entry should be has_evidence rather than evidence?

Good catch. Sure, we can use has_evidence. It may make our life (specifically, export to TSV and import into Neo4j and mediKanren) simpler if we have every KG2 edge dict have a has_evidence list, which by default will be empty. Does that make sense?

ecwood commented 3 years ago

Hi @saramsey, That makes sense. I will commit that change to a branch so that you can review it, if that works for you.

saramsey commented 3 years ago

@ericawood thinks that DrugBank has some evidence code type things that could be put in the has_evidence property.

saramsey commented 3 years ago

Need to audit the KG2 code base to make sure that in every module where we create an edge, we are either doing so via kg2_util.make_edge or that we add the new edge property has_evidence somehow

saramsey commented 3 years ago

This issue seems like a good one to bring back to the front-burner.

kvarforl commented 3 years ago

UniprotKB has a bunch of evidence codes too; some are associated with the names (which I moved to the description as part of RTXteam/RTX#1171), and others are associated with gene synonyms (isolated but not used anywhere as a part of RTXteam/RTX#1259)

Example of evidence code associated with a gene synonym: UniProtKB:Q9Y4F9

GN   Name=RIPOR2;
GN   Synonyms=C6orf32, DIFF48, FAM65B, KIAA0386,
GN   PL48 {ECO:0000303|PubMed:9055809};

Types of evidence codes are documented here

ecwood commented 3 years ago

Sources that do NOT appear to have evidence codes (this will be a long, edited comment as I gather more information):