Closed kevinschaper closed 10 months ago
Looks like it's parsed from obograph: https://github.com/biolink/kgx/blob/4e2d410fbc9b130932f3ad064239ee22044e84c1/kgx/source/obograph_source.py#L334-L336 but the equivalent JSON and OWL parsers don't explicitly parse the field.
@caufieldjh It seems like that means that there isn't exactly a bug in kgx, but we do need to figure out how to pass the node properties into
transform(
inputs=[data_file_json],
input_format="obojson",
output=os.path.join(self.output_dir, name),
output_format="tsv",
stream=False,
)
Looking at the method declaration, it seems like transform_config is the only way, but since we don't have a config yaml currently, maybe we'd want to add the option to pass in node_properties?
def transform(
inputs: Optional[List[str]],
input_format: Optional[str] = None,
input_compression: Optional[str] = None,
output: Optional[str] = None,
output_format: Optional[str] = None,
output_compression: Optional[str] = None,
stream: bool = False,
node_filters: Optional[List[Tuple[str, str]]] = None,
edge_filters: Optional[List[Tuple[str, str]]] = None,
transform_config: str = None,
source: Optional[List] = None,
knowledge_sources: Optional[List[Tuple[str, str]]] = None,
# this parameter doesn't get used, but I leave it in
# for now, in case it signifies an unimplemented concept
# destination: Optional[List] = None,
processes: int = 1,
infores_catalog: Optional[str] = None,
)
I agree - just need the extra param for the transform function, though this does suggest that using a config yaml instead may be a good idea. Is that feasible for the Monarch builds?
The transform call I pasted in above is from kg-phenio, I think that's where we'd need to add config yaml
Ah, well in that case it should be easy to test. I'll open an issue over there and start a PR.
It also looks like the way the deprecated flag is parsed in obograph_source misses many potential instances, since it only looks at meta
. If I go from owl -> json -> tsv (as in kg-phenio or other kgs) then this appears to miss the majority of the deprecated flags. They just don't end up in the output tsv nodes.
I think this issue may now be partially resolved by my PR https://github.com/biolink/kgx/pull/475 - which is still a draft PR for the moment.
However, it may be helpful @caufieldjh and @kevinschaper to have you give me a small sample file of the source data from which you feel KGX is not propagating the 'deprecated
' flag.
Better yet, if you can, please send me a small snippet of Python KGX transform code which resembles the code what you are currently doing in your (phenio?) code base.
In that way, I can use both the sample data and your snippet to add another unit test for the owl -> json -> tsv transitive transformation, then use it to check if there are any other loose ends in the KGX code that need fixing.
Thanks @RichardBruskiewich ! The source data (from phenio.json) looks like this:
{
"id" : "http://purl.obolibrary.org/obo/GO_0051370",
"lbl" : "obsolete ZASP binding",
"type" : "INDIVIDUAL",
"meta" : {
"definition" : {
"val" : "OBSOLETE. Binding to Z-band alternatively spliced PDZ motif protein (ZASP). ZASP is a Z-band protein specifically expressed in heart and skeletal muscle. This protein contains N-terminal PDZ domain and C-terminal LIM domain.",
"xrefs" : [ "PMID:10427098", "PMID:11699871" ]
},
"comments" : [ "This term was made obsolete because it represents binding to an individual protein." ],
"synonyms" : [ {
"pred" : "hasExactSynonym",
"val" : "Z-band alternatively spliced PDZ-motif protein binding"
}, {
"pred" : "hasExactSynonym",
"val" : "ZASP binding"
} ],
"basicPropertyValues" : [ {
"pred" : "http://purl.obolibrary.org/obo/IAO_0100001",
"val" : "GO:0008092"
}, {
"pred" : "http://www.geneontology.org/formats/oboInOwl#hasOBONamespace",
"val" : "molecular_function"
} ],
"deprecated" : true
}
The transform kg-phenio uses is this one: https://github.com/Knowledge-Graph-Hub/kg-phenio/blob/master/kg_phenio/transform_utils/phenio/phenio_transform.py with the specific obojson -> kgx tsv transform code as this: https://github.com/Knowledge-Graph-Hub/kg-phenio/blob/c3f7d0bc0f38ccd60a2724510edebe4cab63b659/kg_phenio/transform_utils/phenio/phenio_transform.py#L122-L129
I assume that the phenio.json data is normally properly wrapped within a suitable KGX-like objson-like wrapper, something like:
{
"graphs": [
{
"nodes": [
{
"id": "http://purl.obolibrary.org/obo/GO_0051370",
"lbl": "obsolete ZASP binding",
...etc,
"deprecated": true
}
}
],
"edges": [],
"id": "http://purl.obolibrary.org/obo/phenio.owl",
"meta": {
"subsets": [],
"xrefs": [],
"basicPropertyValues": []
}
}
]
}
Using this basic structure, I confirm that the obograph source parser detects the deprecated
property in the data.
I will now iterate towards a unit test to detect transitive conservation of the deprecated
status across various transformations.
I assume that the phenio.json data is normally properly wrapped within a suitable KGX-like objson-like wrapper, something like:
Yes, it is - apologies for the rough snippet
Hi @caufieldjh and @kevinschaper , I think the PR https://github.com/biolink/kgx/pull/475 is close to completed. I added a unit test along the lines of the 'transform' snippet above, with the phenio.json, and it seems to pass.
I suppose I also now need to check the owl <-> (json or tsv) path too?
Fully resolved by https://github.com/biolink/kgx/pull/475 which is now released in KGX release v2.2.5
Thanks again @RichardBruskiewich
Mainstream use of 'deprecate
' flag on instances (not classes) of biolink:Entity now supported by PR https://github.com/biolink/biolink-model/pull/1421; now supported in Biolink Model release 3.6.0
Looking at kg-phenio and kg-obo, (and digging into the kg-phenio build), it looks like kgx doesn't produce a
deprecated
column when converting from obojson.Related to / inspired by: https://github.com/monarch-initiative/monarch-app/issues/421