EBISPOT / scatlas_ontology

SCA_Ontology
4 stars 2 forks source link

Ensure scao has complete part_of chains #20

Open dosumis opened 2 years ago

dosumis commented 2 years ago

We need to ensure that SCAO has complete (inferred) chains of part_of relationships between all terms in the seed. The current pipeline attempts to do this using robot filter + a rather complex and confusing set of options. In testing we have become aware that this approach does not produce complete results.

Proposal: the current approach should be replaced by one using Soufflé, which is now part of the ODK container.

This is roughly the algorithm:

  1. Input: termfile T (of terms to retain), ontology O
  2. robot materialise part_of (BFO:0000050) in O
  3. Existentials to direct links (https://github.com/balhoff/relation-graph)
  4. Dump relation graph into souffle (Use Souffle to saturate closure, or does relation-graph do this already?)
  5. remove all closure triples that do not contain t in subject and object positions (Souffle rule(s))
  6. reduce closure (Perhaps unnecessary as we can use Robot to redundancy strip? OTOH, this might be the most efficient approach)
  7. transform back to existentials (Note from Nico: can do in souffle -> EXISTCLOSURE)
  8. robot filter -T T, --term OP (i.e. part of) *
  9. robot merge -i EXITCLOSURE

*. filter --term-file components/$*_simple_seed.txt --select "annotations ontology anonymous self" --trim true --signature true \

Soufflé is a core component of the UberGraph generation code - e.g. see pruning script used to generate redundant graph: https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/prune.dl#L31

@matentzn - this my attempt at the positive version of your algo - which I think was for subset antislim term removal.

dosumis commented 2 years ago

See https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/Makefile#L49 & preceding goal chain for RDF to Soufflé.

Conversion from turtle requires Jim's relation graph script see: https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/Dockerfile#L23

dosumis commented 2 years ago

Plan - get this working in an experimental repo with a small test file. Then work out how to edit scao.Makefile to incorporate.

dosumis commented 2 years ago

Test

We can use indirect part_of relationships here as probes for whether this has worked: https://github.com/hubmapconsortium/ccf-validation-tools/blob/master/logs/class_Kidney_indirect_log.tsv (although not all may appear directly as these are edges specified by ASCT+B table authors).

dosumis commented 2 years ago

If above test works, the next step is to try to merge this into the existing pipeline. This will require more discussion. It's not clear to me whether the import modules have sufficient axiomatisation for complete gap-filling.

Current steps (as I understand them): Seed + mirrors -> SLME module extraction -> reduce relationships (existential restriction on classes) to part_of only. We expect to fill in the gaps at this last step, but may have already lost some of the axiomatization needed in the module extraction step. TBA with @matentzn

anitacaron commented 2 years ago

I created a repository to test this approach: https://github.com/anitacaron/souffle_test

dosumis commented 2 years ago

Draft spec for modifying makefile to fix:

Goal to change:

components/%.owl: imports/%_import.owl components/%_simple_seed.txt $(SCATLAS_KEEPRELATIONS)

https://github.com/EBISPOT/scatlas_ontology/blob/master/src/ontology/scatlas.Makefile#L55. (@anitacaron to update link)

This should be replaced by a series of steps with Souffle - as documented here: https://github.com/anitacaron/souffle_test/blob/main/Makefile

Note - needs to take $(SCATLAS_KEEPRELATIONS) rather than hard wiring as now.

anitacaron commented 2 years ago

What would be the merged ontology, in this case? Is the mirror file for each component?