Duplication in .cl files

wceusters commented 1 year ago

There are several axioms that are listed in more than 1 .cl file. It doesn't harm the logic of course, but it may impact the efficiency of certain type of reasoners that use the .cl files as input. Here is the list of duplications: % xlm-1: [[spatial.cl, universal-declaration.cl]] % uns-1: [[continuant-mereology.cl, existence-instantiation.cl]] % suk-1: [[order.cl, temporal-region.cl]] % rzv-1: [[order.cl, temporal-region.cl]] % qqv-1: [[order.cl, temporal-region.cl]] % qnf-1: [[order.cl, temporal-region.cl]] % qga-1: [[order.cl, temporal-region.cl]] % qaf-1: [[existence-instantiation.cl, temporal-region.cl]] % owb-1: [[order.cl, temporal-region.cl]] % njq-1: [[temporal-region.cl, universal-declaration.cl]] % miz-1: [[order.cl, temporal-region.cl]] % fmm-1: [[generic-dependence.cl, participation.cl]] % eeg-1: [[temporal-region.cl, temporalized-relations.cl]] % cgn-1: [[participation.cl, specific-dependency.cl]] % adm-1: [[continuant-mereology.cl, material-entity.cl]] % acg-1: [[order.cl, temporal-region.cl]]

alanruttenberg commented 1 year ago

It shouldn't be a problem with reasoners, but please let me know if you have an example where it does.

The duplication is a known issue which I don't plan to fix. It's a consequence of the fact that some axioms, in the source, are classified in different groups because there wasn't an obvious choice for a single category, and because the choice of what axiom goes in which file is based on the categorization.

I'll leave the issue open in case I decide down the road to modify the categorization.

wceusters commented 1 year ago

It depends on what you call 'problem' of course. Logic-wise, as I said, there isn't any. However. My reasoner works directly with the axioms and converts them to a rule base. Then it uses instance data to create a BFO2020 compatible model.

One issue is processing time. If I give as only input "(exists-at t t)", "t" not being a variable here but an identifier for some not further described occurrent, then a model is created;

with the duplicated axioms included: 3.063 CPU in 3.183 seconds
with duplicated axioms removed: 2.688 CPU in 2.795 seconds for the exact same model. The reason is that the duplicated axioms create redundant rules.

The second issue is finding a minimal model. The method requires introducing unique skolem functions for each remaining existentially quantified variable that remains after converting implications to disjunctions and moving quantifiers outside negations. Applying that to duplicated axioms which end up requiring skolemization creates more skolem functions than needed, hence again more processing time required to make assumptions about possible identity of the individuals the functions return and then figure out whether the smaller model still fits the logic.

It is because of the latter over-generation I became aware of the duplication, so I changed the parser to take it into account. No issue for me anymore, but I lost some time in trying to find programming errors in my reasoner while there weren't any (related to that issue at least).

I also lost some time in trying to figure out why some inconsistency proofs generated by my reasoner on the basis of inconsistent input didn't make sense until I identified that some axiom identifiers were not unique (see other issues now closed): the proofs were correct, but the explanation pulled up the wrong axiom. So again something the parser now checks for.

alanruttenberg commented 1 year ago

Could you keep track of which [xxx-x] as they are loaded, and skip over duplicates should you encounter the same [xxx-x] again?

wceusters commented 1 year ago

Well, if the [xxx-x] were guaranteed to be truly unique, yes, but as I demonstrated they weren't. So now I keep in a table the comments, the axiom-identifiers and the axioms and when all .cl files are parsed, I perform the checks on that table. Parsing is not the problem. I built a DCG and that is fast. For all files together: 0.859 CPU in 0.868 seconds. It also checks during the parse of each individual axiom whether there are grammatical or syntax errors. Generation of the rulebase happens in a 2nd step after the checks, and redundancy is then removed.

alanruttenberg commented 1 year ago

Non-unique shouldn't happen again. I have a check while loading the database as well as check during the release process.

wceusters commented 1 year ago

Good. But anyhow, I will keep the checks as BFO2020 isn't the only input. There is currently also a PhD student who is writing axioms in .cl files for his application ontology. And I am working on getting OGMS fully axiomatized.

BFO-ontology / BFO-2020

Duplication in .cl files #52