Open caufieldjh opened 1 year ago
KG-Phenio has edge properties, they're just quite minimal. This is the header:
id subject predicate object category relation knowledge_source
Not much going on there!
For comparison, here's the Monarch graph heading:
id original_subject predicate original_object category aggregator_knowledge_source primary_knowledge_source publications qualifiers provided_by has_evidence stage_qualifier relation knowledge_source negated frequency_qualifier onset_qualifier sex_qualifier evidence subject object
Not all properties will be necessary for PHENIO, but the knowledge sources can certainly be expanded.
The merged Upheno mapping table also needs a knowledge_source added, but that can just be added at transform time as a KGX argument.
Also want:
I was just poking at some kgx validation output and I noticed I had complaints about ZFA being an invalid prefix in ZP->ZFA associations, and it might have something to do with the blank categories. Here's a summary of edges with blank categories
category | subject_namespace | predicate | object_namespace | primary_knowledge_source | count(*) |
---|---|---|---|---|---|
GO | biolink:subclass_of | GO | infores:go | 83592 | |
ZP | biolink:subclass_of | ZP | infores:zp | 60886 | |
FBbt | biolink:related_to | FBbt | infores:fbbt | 52322 | |
MONDO | biolink:subclass_of | MONDO | infores:mondo | 38795 | |
XPO | biolink:subclass_of | XPO | infores:xpo | 35717 | |
FBbt | biolink:subclass_of | FBbt | infores:fbbt | 31665 | |
UBERON | biolink:subclass_of | UBERON | infores:uberon | 23079 | |
HP | biolink:subclass_of | HP | infores:hp | 22532 | |
GO | biolink:related_to | GO | infores:go | 20260 | |
UBERON | biolink:related_to | UBERON | infores:uberon | 19247 | |
MP | biolink:subclass_of | MP | infores:mp | 18334 | |
WBbt | biolink:subclass_of | WBbt | infores:wbbt | 8168 | |
EMAPA | biolink:related_to | EMAPA | infores:emapa | 7037 | |
WBbt | biolink:related_to | WBbt | infores:wbbt | 6924 | |
CHEBI | biolink:subclass_of | CHEBI | infores:chebi | 6559 | |
ZP | biolink:related_to | ZFA | infores:upheno | 5840 | |
ZP | biolink:related_to | GO | infores:upheno | 5822 | |
CHEBI | biolink:related_to | CHEBI | infores:chebi | 5043 | |
EMAPA | biolink:subclass_of | EMAPA | infores:emapa | 4545 | |
EMAPA | biolink:subclass_of | UBERON | infores:emapa | 4477 | |
WBPhenotype | biolink:subclass_of | WBPhenotype | infores:wbphenotype | 3364 | |
ZFA | biolink:subclass_of | ZFA | infores:zfa | 3199 | |
MONDO | biolink:related_to | UBERON | infores:mondo | 3027 | |
WBbt | biolink:subclass_of | GO | infores:wbbt | 2766 | |
ZFA | biolink:related_to | ZFA | infores:zfa | 2752 | |
MP | biolink:related_to | UBERON | infores:upheno | 2577 | |
MONDO | biolink:related_to | MONDO | infores:mondo | 2458 | |
GO | biolink:related_to | CHEBI | infores:go | 2090 | |
ZFA | biolink:subclass_of | UBERON | infores:zfa | 2071 | |
HP | biolink:related_to | UBERON | infores:upheno | 1588 | |
MONDO | biolink:related_to | HP | infores:mondo | 1450 | |
GO | biolink:related_to | UBERON | infores:go | 1124 | |
MPATH | biolink:subclass_of | MPATH | infores:mpath | 946 | |
MP | biolink:related_to | GO | infores:upheno | 870 | |
FBbt | biolink:related_to | GO | infores:fbbt | 571 | |
ZP | biolink:related_to | CHEBI | infores:upheno | 455 | |
MONDO | biolink:related_to | GO | infores:mondo | 432 | |
UBERON | biolink:related_to | GO | infores:uberon | 423 | |
FBbt | biolink:subclass_of | UBERON | infores:fbbt | 369 | |
HP | biolink:related_to | CHEBI | infores:upheno | 359 | |
WBPhenotype | biolink:related_to | GO | infores:upheno | 325 | |
HP | biolink:related_to | GO | infores:upheno | 279 | |
WBPhenotype | biolink:related_to | WBbt | infores:upheno | 264 | |
MP | biolink:related_to | CHEBI | infores:upheno | 191 | |
XPO | biolink:related_to | GO | infores:upheno | 141 | |
ZP | biolink:related_to | MPATH | infores:upheno | 137 | |
MP | biolink:related_to | MPATH | infores:upheno | 134 | |
HP | biolink:related_to | MPATH | infores:upheno | 71 | |
MONDO | biolink:related_to | CHEBI | infores:mondo | 49 | |
WBbt | biolink:subclass_of | UBERON | infores:wbbt | 49 | |
MP | biolink:related_to | MP | infores:upheno | 32 | |
UBERON | biolink:related_to | CHEBI | infores:uberon | 32 | |
WBPhenotype | biolink:related_to | CHEBI | infores:upheno | 21 | |
MPATH | biolink:related_to | MPATH | infores:mpath | 3 | |
HP | biolink:related_to | HP | infores:upheno | 2 | |
UBERON | biolink:subclass_of | GO | infores:uberon | 2 | |
WBPhenotype | biolink:related_to | UBERON | infores:upheno | 1 |
I am not sure we have an implemented strategy for populating edge categories when going from owl->kgx
This could be done in kgx by inference,
gene to phenotypic feature association:
is_a: association
exact_mappings:
- WBVocab:Gene-Phenotype-Association
defining_slots:
- subject
- object
mixins:
- entity to phenotypic feature association mixin
- gene to entity association mixin
slot_usage:
subject:
range: gene or gene product
description: "gene in which variation is correlated with the phenotypic feature"
examples:
- value: HGNC:2197
description: "COL1A1 (Human)"
object:
range: phenotypic feature
however I would do this with linkml:classification_rules now
this will probably not be straightforward to add to kgx - @kevinschaper how much does our validation strategy depend on this being present
It looks like we're hitting some kgx validation issues within translator infrastructure that might be coming in from blank category fields on edges. Would it work to just fill in with biolink:Association
rather than nulls?
(I think I might do that in my phenio kgx massaging, and of course it won't do anything once they're set)
@kevinschaper reports that kg-phenio may be missing edge categories.