Open dosumis opened 2 years ago
See https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/Makefile#L49 & preceding goal chain for RDF to Soufflé.
Conversion from turtle requires Jim's relation graph script see: https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/Dockerfile#L23
Plan - get this working in an experimental repo with a small test file. Then work out how to edit scao.Makefile to incorporate.
Test
robot filter --select annotations
to generate a file of terms with associated annotations. We can use indirect part_of relationships here as probes for whether this has worked: https://github.com/hubmapconsortium/ccf-validation-tools/blob/master/logs/class_Kidney_indirect_log.tsv (although not all may appear directly as these are edges specified by ASCT+B table authors).
If above test works, the next step is to try to merge this into the existing pipeline. This will require more discussion. It's not clear to me whether the import modules have sufficient axiomatisation for complete gap-filling.
Current steps (as I understand them): Seed + mirrors -> SLME module extraction -> reduce relationships (existential restriction on classes) to part_of only. We expect to fill in the gaps at this last step, but may have already lost some of the axiomatization needed in the module extraction step. TBA with @matentzn
I created a repository to test this approach: https://github.com/anitacaron/souffle_test
Draft spec for modifying makefile to fix:
Goal to change:
components/%.owl: imports/%_import.owl components/%_simple_seed.txt $(SCATLAS_KEEPRELATIONS)
https://github.com/EBISPOT/scatlas_ontology/blob/master/src/ontology/scatlas.Makefile#L55. (@anitacaron to update link)
This should be replaced by a series of steps with Souffle - as documented here: https://github.com/anitacaron/souffle_test/blob/main/Makefile
Note - needs to take $(SCATLAS_KEEPRELATIONS) rather than hard wiring as now.
What would be the merged ontology, in this case? Is the mirror file for each component?
We need to ensure that SCAO has complete (inferred) chains of part_of relationships between all terms in the seed. The current pipeline attempts to do this using robot filter + a rather complex and confusing set of options. In testing we have become aware that this approach does not produce complete results.
Proposal: the current approach should be replaced by one using Soufflé, which is now part of the ODK container.
This is roughly the algorithm:
*.
filter --term-file components/$*_simple_seed.txt --select "annotations ontology anonymous self" --trim true --signature true \
Soufflé is a core component of the UberGraph generation code - e.g. see pruning script used to generate redundant graph: https://github.com/INCATools/ubergraph/blob/3be8948138d7ed03941435f8f115bc25cd74510a/prune.dl#L31
@matentzn - this my attempt at the positive version of your algo - which I think was for subset antislim term removal.