RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
37 stars 8 forks source link

Convert Nodes Publications List into Edges with Provenance #37

Open ecwood opened 4 years ago

ecwood commented 4 years ago

UAB noted that edges without provenance are unreliable and that some nodes (particularly UniProt nodes) have great information in their descriptions with publications attached. Those publications get put into the node's publication list rather than on an edge, which UAB would find more helpful. It appears that ETLing a dump of https://www.ebi.ac.uk/QuickGO/annotations?downloadLimit=100&reference=PMID would be helpful in this process. For some sources, it also appears that this will not be feasible since these types of edges don't exist in the data.

saramsey commented 3 years ago

Is this related to issue RTXteam/RTX-KG2#32?

ecwood commented 3 years ago

Is this related to issue RTXteam/RTX-KG2#32?

No, this is referring to (in the short term) ETLing an additional source so that edges between UniProt and GO would have PMIDs as evidence. In the long term, essentially, UAB wants us to take all of the PMIDs that are currently in the nodes publication list and turn them into edges. As is, UAB doesn't have any use for them because they aren't supporting any claims.

saramsey commented 3 years ago

Sure, if this helps us to put PMIDs on our KG2 edges between Uniprot proteins and GO terms, that's a good thing. Factors for you to consider:

  1. download size
  2. anticipated complexity of writing the ETL script, based on the data format
  3. evidence that this resource is being updated (I assume it is, but it is worth checking)