RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
34 stars 9 forks source link

knowledge_level on all edges? #358

Open edeutsch opened 5 months ago

edeutsch commented 5 months ago

From Tyler's post in Architecture today:

image

saramsey commented 5 months ago

April 1 deadline to get this coded and deployed.

saramsey commented 5 months ago

Steve: look in the Biolink model to see if we can get the controlled vocabulary for knowledge_level

saramsey commented 5 months ago

Here is the schema that we will need to implement. There are actually two new edge properties, agent_type and knowledge_level:

https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md

saramsey commented 5 months ago

I'm hearing that we should plan our work to aim to have this rolled out in a new build of RTX-KG2 KP and tested in CI (and ready to deploy to ITRB TEST) by mid-March (final date to be announced at the relay next week).

saramsey commented 3 months ago

Lili and I are envisioning a new module that processes the KG2 edges json-lines file, downstream of filter_kg_and_remap_predicates.py. In rough pseudocode:


# load JSON-lines edge file

# iterate over edges

# for each edge, assume the edge exists as a python dictionary `edge_dict`

primary_knowledge_source = edge_dict['primary_knowledge_source']
edge_dict['attributes'] = [
    {
        "attribute_type_id": "biolink:agent_type",
        "value": "not_provided",
        "attribute_source": primary_knowledge_source
    },
    {   
        "attribute_type_id": "biolink:knowledge_level",
        "value": "not_provided",
        "attribute_source": primary_knowledge_source
    }
]

# write the edges list back out as a JSON-lines file (with a new filename)
saramsey commented 3 months ago

TBD: we will need to come up with a mapping strategy (which may be based on yet another YAML file) to determine what values should be put, instead of "not_provided".

saramsey commented 3 months ago

We are hoping to get this as KG2.10.Xc into the Translator release that takes place after the Eel release (which would probably mean a late June deadline for an ARAX based on KG2.10.Xc to be requested to be deployed into ITRB TEST. Which, in turn, would mean that KG2.10.Xpre would need to be built and validated correct for this issue by mid-May. We are aiming to do the treats refactor work (#373) before this, i.e., in the Eel release.

ecwood commented 2 days ago

@saramsey Do we need this in an attributes list? Currently, we don't use an attributes list in RTX-KG2pre. That data type won't serialize nicely into Neo4j (since it just becomes a big string). I think that the attributes list is usually created downstream of KG2pre. I think this schema would make more sense, given that the attribute_source will be identical to the source of the edge anyway (the primary_knowledge_source). Further, it would be consistent with all of the other edge properties currently in KG2.

edge_dict['agent_type'] = agent_type_map[primary_knowledge_source]
edge_dict['knowledge_level'] = knowledge_level_map[primary_knowledge_source]
ecwood commented 2 days ago

@saramsey Are we currently expected to fulfill the short term or long term specification on https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md?

saramsey commented 1 day ago

@ecwood no I don't think in KG2pre, we need it in an edge attributes list. It could be two new edge properties knowledge_level and agent_type that could be turned into edge attributes in the RTX-KG2 TRAPI interface. @amykglen and @sundareswarpullela does that make sense?

But yes, @ecwood, at this point, Translator has moved to implement the long-term spec, i.e., each edge needs to have a knowledge_level and agent_type documented on it. But I think for the purpose of the KG2pre build, it makes sense to do it using two new edge properties, knowledge_level and agent_type, which can be turned into TRAPI edge attributes in the RTX-KG2 TRAPI interface.

amykglen commented 13 hours ago

yes, adding knowledge_level and agent_type as two new edge properties in KG2pre makes sense to me. that should be easy to incorporate into KG2c and load into TRAPI format in the RTX-KG2 API code (as we do for the other KG2 properties).