AKSW / QuitDiff

Command line comparison tool for semantic web data, it can be used as git difftool for RDF data as well.
GNU General Public License v3.0
7 stars 2 forks source link

Graphs are not recognized as isomorphic if datatype IRIs differ in unicode char encoding #8

Closed JJ-Author closed 5 years ago

JJ-Author commented 5 years ago

python3 ~/difftest/QuitDiff/bin/quit-diff --diffFormat=eccrev . datatypeIRI-escape.nt 1 2 datatypeIRI-wrapper.nt

return the following result although the graphs should be isomorphic. also note the escape which is used in the datatype IRI insert but not in the delete and is not used in the raw data.

ns1:insert {
    <http://ar.dbpedia.org/resource/إكشاف_مثاني> ns2:meshnumber "12.74"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .

    <http://ar.dbpedia.org/resource/متلازمة_مارشال> ns2:meshid "536025.0"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .
}
ns1:delete {
    <http://ar.dbpedia.org/resource/إكشاف_مثاني> ns2:meshnumber "12.74"^^<http://dbpedia.org/datatype/nicaraguanCórdoba> .

    <http://ar.dbpedia.org/resource/متلازمة_مارشال> ns2:meshid "536025.0"^^<http://dbpedia.org/datatype/nicaraguanCórdoba> .
}
cat datatypeIRI-wrapper.nt 
<http://ar.dbpedia.org/resource/\u0623\u0643\u062A\u0648\u0628\u0627\u0645\u064A\u0646> <http://ar.dbpedia.org/property/kegg> "4227.0"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .
<http://ar.dbpedia.org/resource/\u0625\u0643\u0634\u0627\u0641_\u0645\u062B\u0627\u0646\u064A> <http://ar.dbpedia.org/property/meshnumber> "12.74"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .
<http://ar.dbpedia.org/resource/\u0645\u062A\u0644\u0627\u0632\u0645\u0629_\u0645\u0627\u0631\u0634\u0627\u0644> <http://ar.dbpedia.org/property/meshid> "536025.0"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .
cat datatypeIRI-escape.nt                                                                                
<http://ar.dbpedia.org/resource/أكتوبامين> <http://ar.dbpedia.org/property/kegg> "4227.0"^^<http://dbpedia.org/datatype/nicaraguanC\u00F3rdoba> .
<http://ar.dbpedia.org/resource/إكشاف_مثاني> <http://ar.dbpedia.org/property/meshnumber> "12.74"^^<http://dbpedia.org/datatype/nicaraguanCórdoba> .
<http://ar.dbpedia.org/resource/متلازمة_مارشال> <http://ar.dbpedia.org/property/meshid> "536025.0"^^<http://dbpedia.org/datatype/nicaraguanCórdoba> .
white-gecko commented 5 years ago

Well the escape is used in the raw data as I can see it in your example. But we seem to have a problem with the translation of datatype IRIs

JJ-Author commented 5 years ago

well maybe i need to re-specify to state it clear. insert uses no escaping for the subject IRIs (although the new file - datatypeIRI-wrapper.nt uses them, but escapes are used (no replacement of escape codes) in the datatype IRI

white-gecko commented 5 years ago

This is slightly related to: https://github.com/RDFLib/rdflib/issues/792 .

white-gecko commented 5 years ago

I think it is fine, that QuitDiff understands both escaped and not-escaped input and also identifies them as equal. It is also ok that it return the result with not-escaped liters. As we see it for the subject. Our aim is to output a canonical form (cf. https://www.w3.org/TR/n-triples/#canonical-ntriples). But it is not correct, that it doesn't handle the datatype IRI in the same way.

white-gecko commented 5 years ago

I've reported your issue upstream: https://github.com/RDFLib/rdflib/issues/859 and have provided a fix for it: https://github.com/RDFLib/rdflib/pull/860

white-gecko commented 5 years ago

Since this is merged now, we need to find out, how to specify the dependency for the current master branch of rdflib.

white-gecko commented 5 years ago

@JJ-Author: Are you able to find this out?

JJ-Author commented 5 years ago

sudo pip3 install git+git://github.com/RDFLib/rdflib.git@master seems to work for the moment although it prints some warnings

sudo pip3 install git+git://github.com/RDFLib/rdflib.git@master
Downloading/unpacking git+git://github.com/RDFLib/rdflib.git@master Cloning git://github.com/RDFLib/rdflib.git (to master) to /tmp/pip-s8mo4s0b-build Running setup.py (path:/tmp/pip-s8mo4s0b-build/setup.py) egg_info for package from git+git://github.com/RDFLib/rdflib.git@master /usr/lib/python3/dist-packages/setuptools/dist.py:333: UserWarning: Normalizing '5.0.0-dev' to '5.0.0.dev0' normalized_version, warning: no files found matching 'ez_setup.py' no previously-included directories found matching 'docs/_build' warning: no previously-included files matching '.pyc' found anywhere in distribution warning: no previously-included files matching '$py.class' found anywhere in distribution Requirement already satisfied (use --upgrade to upgrade): isodate in /usr/local/lib/python3.4/dist-packages (from rdflib==5.0.0.dev0) Requirement already satisfied (use --upgrade to upgrade): pyparsing in /usr/local/lib/python3.4/dist-packages (from rdflib==5.0.0.dev0) Requirement already satisfied (use --upgrade to upgrade): six in /usr/lib/python3/dist-packages (from rdflib==5.0.0.dev0) Installing collected packages: rdflib Found existing installation: rdflib 4.2.1 Uninstalling rdflib: Successfully uninstalled rdflib Running setup.py install for rdflib /usr/lib/python3/dist-packages/setuptools/dist.py:333: UserWarning: Normalizing '5.0.0-dev' to '5.0.0.dev0' normalized_version, warning: no files found matching 'ez_setup.py' no previously-included directories found matching 'docs/_build' warning: no previously-included files matching '.pyc' found anywhere in distribution warning: no previously-included files matching '$py.class' found anywhere in distribution Installing rdfs2dot script to /usr/local/bin Installing rdfgraphisomorphism script to /usr/local/bin Installing rdfpipe script to /usr/local/bin Installing csv2rdf script to /usr/local/bin Installing rdf2dot script to /usr/local/bin Could not find .egg-info directory in install record for rdflib==5.0.0.dev0 from git+git://github.com/RDFLib/rdflib.git@master