dbpedia / gstore

Git repo / triple store hybrid graph storage
Apache License 2.0
3 stars 0 forks source link

Bug on upload (Virtuoso only) - pending fix from virtuoso team #24

Open manonthegithub opened 2 years ago

manonthegithub commented 2 years ago

https://github.com/dbpedia/databus-transfer/issues/1

manonthegithub commented 2 years ago

Likely to be linked to: https://github.com/openlink/virtuoso-opensource/issues/571

holycrab13 commented 2 years ago

Very likely not a gstore issue

Could you make a gstore version where it logs out all the drop graph and insert graph statements? I will try to reproduce it with standalone virtuoso then

Probably linked to the issue you posted but it's unsolved since 2016 - let's hope for the best :D Branch of databus-transfer repo to reproduce the issue: https://github.com/dbpedia/databus-transfer/tree/insert-stopped-debug

manonthegithub commented 2 years ago

@holycrab13 added GSTORE_LOG_LEVEL env var, you can now set GSTORE_LOG_LEVEL=DEBUG in docker-compose to enable looking of queries

manonthegithub commented 2 years ago

ok very strange issue…. so the query size doesnt matter. it happens on different queries, randomly. I mean: I i split the file into insrts of 100 triples, it fails randomly on different parts, sometime on first 200 triples, sometimes on 900 triples. No idea what is the problem, it is a bug in virtuoso, we can make a ticket there.

Restarting virtuoso and gstore and saving some other files first helps.

manonthegithub commented 2 years ago

Posted repro to https://github.com/openlink/virtuoso-opensource/issues/571 hope they will be able to fix this soon

kurzum commented 2 years ago

ok, I also reproduced the bug now. I made a test set for bash:

isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2

i1.sparql.txt i2.sparql.txt

kurzum commented 2 years ago

I split the triples and ran them individually:

while read p; do
  echo "----------------"
  echo "$p"
  isql-vt 1111 dba password VERBOSE=ON exec="sparql INSERT IN GRAPH <http://localhost:3002/g/test/mappings-geo-coordinates-mappingbased-2018.09.12-dataid.jsonld> { $p } ;"
  echo "----------------"
done <triples.txt

seems like it is definitely the preview triples. When run individually they throw syntax errors: ri2.txt

Then I split the triples of i2 into no preview (i3) and only preview (i4): i3.sparql.txt i4.sparql.txt

Then I tested it again:

# no preview triples are loaded first. This seems to initiate the DB properly and sets up the graph. Then loading i1 and i2 still throw an error "-- More than 0 parameters, ignoring all the rest of the statement #line 1 "i2.sparql.txt"" but they do not corrupt the store any more.
isql-vt 1111 dba password VERBOSE=ON i3.sparql.txt > 3.1.txt 2>3.2.txt
isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2.txt
# running i1 or i2 first which contain the preview property mess up the store:
isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2.txt
isql-vt 1111 dba password VERBOSE=ON i3.sparql.txt > 3.1.txt 2>3.2.txt

Fazit: Overall this seems to be an encoding thing. ODBC/JDBC have certain control and macro characters like $. The preview triple was originally created by me in the old maven upload client. back then I already had trouble with creating these as it is -- until now -- unclear to me, what I needed to escape/encode exactly when putting RDF in RDF as a Literal. This get's potentiated by the different available syntaxes (ntriples, ttl, rdf/xml) plus also the they have to go into SPARQL which is yet another syntax and I am not sure, if SPARQL INSERT is exactly like turtle or has different details.

Solution suggestions:

Still uncertain:

manonthegithub commented 2 years ago

@kurzum you should better post it there: https://github.com/openlink/virtuoso-opensource/issues/571 it is really happening at different moments and even places in the same data