Knowledge-Graph-Hub / kg-phenio

A graph for accessing and comparing knowledge concerning phenotypes across species and genetic backgrounds.
BSD 3-Clause "New" or "Revised" License
5 stars 4 forks source link

Missing edge categories? #88

Open caufieldjh opened 1 year ago

caufieldjh commented 1 year ago

@kevinschaper reports that kg-phenio may be missing edge categories.

caufieldjh commented 1 year ago

KG-Phenio has edge properties, they're just quite minimal. This is the header:

id  subject predicate   object  category    relation    knowledge_source

Not much going on there!

For comparison, here's the Monarch graph heading:

id  original_subject    predicate   original_object category    aggregator_knowledge_source primary_knowledge_source    publications    qualifiers  provided_by has_evidence    stage_qualifier relation    knowledge_source    negated frequency_qualifier onset_qualifier sex_qualifier   evidence    subject object

Not all properties will be necessary for PHENIO, but the knowledge sources can certainly be expanded.

caufieldjh commented 1 year ago

The merged Upheno mapping table also needs a knowledge_source added, but that can just be added at transform time as a KGX argument.

caufieldjh commented 1 year ago

Also want:

kevinschaper commented 1 year ago

I was just poking at some kgx validation output and I noticed I had complaints about ZFA being an invalid prefix in ZP->ZFA associations, and it might have something to do with the blank categories. Here's a summary of edges with blank categories

category subject_namespace predicate object_namespace primary_knowledge_source count(*)
GO biolink:subclass_of GO infores:go 83592
ZP biolink:subclass_of ZP infores:zp 60886
FBbt biolink:related_to FBbt infores:fbbt 52322
MONDO biolink:subclass_of MONDO infores:mondo 38795
XPO biolink:subclass_of XPO infores:xpo 35717
FBbt biolink:subclass_of FBbt infores:fbbt 31665
UBERON biolink:subclass_of UBERON infores:uberon 23079
HP biolink:subclass_of HP infores:hp 22532
GO biolink:related_to GO infores:go 20260
UBERON biolink:related_to UBERON infores:uberon 19247
MP biolink:subclass_of MP infores:mp 18334
WBbt biolink:subclass_of WBbt infores:wbbt 8168
EMAPA biolink:related_to EMAPA infores:emapa 7037
WBbt biolink:related_to WBbt infores:wbbt 6924
CHEBI biolink:subclass_of CHEBI infores:chebi 6559
ZP biolink:related_to ZFA infores:upheno 5840
ZP biolink:related_to GO infores:upheno 5822
CHEBI biolink:related_to CHEBI infores:chebi 5043
EMAPA biolink:subclass_of EMAPA infores:emapa 4545
EMAPA biolink:subclass_of UBERON infores:emapa 4477
WBPhenotype biolink:subclass_of WBPhenotype infores:wbphenotype 3364
ZFA biolink:subclass_of ZFA infores:zfa 3199
MONDO biolink:related_to UBERON infores:mondo 3027
WBbt biolink:subclass_of GO infores:wbbt 2766
ZFA biolink:related_to ZFA infores:zfa 2752
MP biolink:related_to UBERON infores:upheno 2577
MONDO biolink:related_to MONDO infores:mondo 2458
GO biolink:related_to CHEBI infores:go 2090
ZFA biolink:subclass_of UBERON infores:zfa 2071
HP biolink:related_to UBERON infores:upheno 1588
MONDO biolink:related_to HP infores:mondo 1450
GO biolink:related_to UBERON infores:go 1124
MPATH biolink:subclass_of MPATH infores:mpath 946
MP biolink:related_to GO infores:upheno 870
FBbt biolink:related_to GO infores:fbbt 571
ZP biolink:related_to CHEBI infores:upheno 455
MONDO biolink:related_to GO infores:mondo 432
UBERON biolink:related_to GO infores:uberon 423
FBbt biolink:subclass_of UBERON infores:fbbt 369
HP biolink:related_to CHEBI infores:upheno 359
WBPhenotype biolink:related_to GO infores:upheno 325
HP biolink:related_to GO infores:upheno 279
WBPhenotype biolink:related_to WBbt infores:upheno 264
MP biolink:related_to CHEBI infores:upheno 191
XPO biolink:related_to GO infores:upheno 141
ZP biolink:related_to MPATH infores:upheno 137
MP biolink:related_to MPATH infores:upheno 134
HP biolink:related_to MPATH infores:upheno 71
MONDO biolink:related_to CHEBI infores:mondo 49
WBbt biolink:subclass_of UBERON infores:wbbt 49
MP biolink:related_to MP infores:upheno 32
UBERON biolink:related_to CHEBI infores:uberon 32
WBPhenotype biolink:related_to CHEBI infores:upheno 21
MPATH biolink:related_to MPATH infores:mpath 3
HP biolink:related_to HP infores:upheno 2
UBERON biolink:subclass_of GO infores:uberon 2
WBPhenotype biolink:related_to UBERON infores:upheno 1
cmungall commented 1 year ago

I am not sure we have an implemented strategy for populating edge categories when going from owl->kgx

This could be done in kgx by inference,

  gene to phenotypic feature association:
    is_a: association
    exact_mappings:
      - WBVocab:Gene-Phenotype-Association
    defining_slots:
      - subject
      - object
    mixins:
      - entity to phenotypic feature association mixin
      - gene to entity association mixin
    slot_usage:
      subject:
        range: gene or gene product
        description: "gene in which variation is correlated with the phenotypic feature"
        examples:
          - value: HGNC:2197
            description: "COL1A1 (Human)"
      object:
        range: phenotypic feature

however I would do this with linkml:classification_rules now

this will probably not be straightforward to add to kgx - @kevinschaper how much does our validation strategy depend on this being present

kevinschaper commented 1 year ago

It looks like we're hitting some kgx validation issues within translator infrastructure that might be coming in from blank category fields on edges. Would it work to just fill in with biolink:Association rather than nulls?

(I think I might do that in my phenio kgx massaging, and of course it won't do anything once they're set)