VirtualFlyBrain / vfb-pipeline-dumps

Pipeline that creates dumps from the triplestore for consumption by the downstream services
Apache License 2.0
0 stars 0 forks source link

The dumps pipeline produces a file with may triples that ROBOT can not parse. #23

Open dosumis opened 2 years ago

dosumis commented 2 years ago
robot merge -i /out/raw/all.ttl \
    reason --reasoner ELK --axiom-generators "SubClass EquivalentClass ClassAssertion" --exclude-tautologies structural \
    relax \
    reduce --reasoner ELK --named-classes-only true \
    annotate --ontology-iri "http://virtualflybrain.org/data/VFB/OWL/raw/all.owl" \
    convert -f owl -o /out/raw/construct_all.owl | { grep -v 'OWLRDFConsumer\|InvalidReferenceViolation\|RDFParserRegistry' || true; }

=>

ERROR Input ontology contains 545373 triple(s) that could not be parsed:
 - <http://virtualflybrain.org/reports/VFBc_00101gc1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-genid-d772490648ac4472932985646f1ab9c7-node1fp2rmh14x1797609.

Possible cause: image http://robot.obolibrary.org/errors

Could the pipeline be somehow producing axioms following rdf reification?

Investigating an example:

Here's one of the unparsed triples reported:

<http://virtualflybrain.org/reports/VFB_00029522> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-genid-4841781bfabd4a7b98ac303327e83a13-node1fpl1cvm3x508939 .

I expected this to be something to do with blank nodes used for reification used in axiom annotation, however, looking at that blank node in the triplestore it looks like a simple, unannotated type axiom:

image

Looking at PDB, this type restriction appears to be present

image

Any idea what's going on? Half a million unparsed triples is at least a bad smell, even if we're not sure of what the consequences might be.

dosumis commented 2 years ago

I note that the text on the Robot doc says "this is often because", suggesting there are other possible causes.

matentzn commented 2 years ago

If @hkir-dev can grep an example from the dump which is tiny and causes the ROBOT warning, we can easily find the reason.. A very typical problem I have encountered is complex "source" or "target" in reification, which must be atomic. But one look at a failing minimal example and we can tell..

hkir-dev commented 2 years ago

Minimal example attached. ROBOT version 1.8.3 Command: robot -vvv reason -i minimal.ttl reason --reasoner ELK -o minimal_reason.owl Result is success, but we still see the error log log:

ERROR Input ontology contains 3 triple(s) that could not be parsed:
 - <http://virtualflybrain.org/reports/VFB_00000001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-genid-674b1cc8101e47ceb845067892d7fd8e-node1fpb4lttix277296.
 - <http://virtualflybrain.org/reports/VFB_00000001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-genid-674b1cc8101e47ceb845067892d7fd8e-node1fpb4lttix277295.
 - <http://virtualflybrain.org/reports/VFB_00000001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-genid-674b1cc8101e47ceb845067892d7fd8e-node1fpb4lttix277297.

minimal.zip minimal2.zip

matentzn commented 2 years ago

The output seems to be noise, see https://github.com/ontodev/robot/issues/965

The triples parse and convert just fine.

matentzn commented 2 years ago

Sorry didnt mean to close it - reflex. Up to you :)