DeciSym / oxigraph

SPARQL graph database
Apache License 2.0
1 stars 0 forks source link

resolve blank node errors in oxigraph hdt implementation #7

Open GregHanson opened 6 months ago

GregHanson commented 6 months ago

Current Test Failures from branch https://github.com/DeciSym/oxigraph/pull/8

failures:
    tests::i18n::normalization_1
    tests::open_world::open_cmp_01
    tests::open_world::open_cmp_02
    tests::sort::dawg_sort_8
GregHanson commented 6 months ago

i18n::normalization_1

current blank node implementation works for _:bXXX prefixed generated blank nodes. The generated hdt file we have in our fork and are testing against (here) was generated with hdt-java library. hdt-cpp fails to parse the source file because it does not like the empty <> on line 11:

rdf2hdt -f ttl -p -v  testsuite/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl test.hdt
Detected RDF input format: ttl
Catch exception load: ERROR: Could not convert triple to IDS!
 http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
0 1 9
ERROR: ERROR: Could not convert triple to IDS!
 http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
0 1 9

Whereas the java implementation generates the files fine as is, except the blank node prefix is _:@

$ hdtSearch oxhdt-sys/tests/resources/rdf-tests/sparql/sparql10/i18n/normalization-01_copy.hdt
Predicate Bitmap in 259 us: 0 % / 14.86 %
Count predicates in 103 usrences: 0 % / 16.075 %
Count Objects in 26 us Max was: 1 0 % / 34.3 %
Bitmap in 345 usbitmap: 0 % / 45.64 %
Bitmap bits: 9 Ones: 9
Object references in 37 usces: 0 % / 48.475 %
Sort lists in 992 usists: 0 % / 68.32 %
Index generated in 2 ms 171 us
>> ? ? ?                                          %
_:@0 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Alice's normalized resumé"
_:@0 http://xmlns.com/foaf/0.1/name "Alice"
_:@1 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Bob's non-normalized resumé"
_:@1 http://xmlns.com/foaf/0.1/name "Bob"

the HDT library has issues handling the @ character. Populating the empty <> on line 11 with <http://test.com> for example, passes the hdt-cpp conversion and shows blank nodes populated by a different prefix - which make the HDT library happy:

$ hdtSearch test.hdt
>> ? ? ?                                         %
_:b1 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Alice's normalized resumé"
_:b1 http://xmlns.com/foaf/0.1/name "Alice"

Questions

  1. hdt-java has the exact normalization file included in their test suite, should hdt-cpp support this too? https://github.com/rdfhdt/hdt-java/blob/43a55a6ec1fb9c549253f51cb761f13f7f3a5bbb/hdt-jena/testing/DAWG/i18n/normalization-01.ttl
  2. why does the HDT library not like the current solution when the blank node is prefixed with _:@
GregHanson commented 6 months ago

related hdt-cpp issue for handling <>: https://github.com/rdfhdt/hdt-cpp/issues/281

donpellegrino commented 6 months ago

Depends on #9.