Open reckart opened 3 years ago
@reckart
hm, the template description of issue still valid
doesn't seem clear.
This issue has been fixed as can be seen here:
Anyhow. de.dbpedia.org/sparql needs fresher data. Web Extraction and latest downloads are clean.
@kurzum thanks for the response. How old is the data on de.dbpedia.org? We already had the issue back in May 2019 if not earlier.
That said: at least de.dbpedia.org doesn't fail on certain SPARQL queries as dbpedia.org appears to do since the recent Virtuoso upgrade.
@kurzum do you know if this was a systematic bug in the code that builds DBPedia or is it something that could bite users again on another tripple (not Hans Lala)?
@kurzum thanks for the response. How old is the data on de.dbpedia.org? We already had the issue back in May 2019 if not earlier.
I don't know how old it is exactly. We are switching to the new system, where all the files are produced monthly and versioned with the Databus. Then you would know exactly what files from which month are loaded.
That said: at least de.dbpedia.org doesn't fail on certain SPARQL queries as dbpedia.org appears to do since the recent Virtuoso upgrade.
Well you say this. Probably, if we update de.dbpedia.org we will get issues that the missing rdf:langString makes queries fail ;)
@kurzum do you know if this was a systematic bug in the code that builds DBPedia or is it something that could bite users again on another tripple (not Hans Lala)?
We build one of the biggest data test frameworks. Please read The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
So this one is covered. Not all tests run perfectly, but we made it so mvn test
fails, if a fixed issue reoccurs. The challenge is a mammoth. A full release has 22 billion triples and then loading them into an application adds an additional layer of problems. It is much more complex and hard than fixing bugs in software only.
We are currently bringing this on the road, i.e. in particular figure 1 of the paper. You saw the new templates for issues. The goal here is to bring down the time to verify, locate, fix an issue to 30 minutes, which would melt down the thousand small problems everywhere.
The reason I am asking whether this issue was fixed systemaically because over at RDF4J, I am lobbying for making the SPARQL results parser a bit more robust/lenient in the face of this particular issue (i.e. langString without lang) so that it still parses the result but returns it as a string instead of a langString.
What do you think? Is this kind of problem one that the data providers should have to fix or should query results parsers such as the one in RDF4J be able to gracefully handle such problems with the data?
That said: at least de.dbpedia.org doesn't fail on certain SPARQL queries as dbpedia.org appears to do since the recent Virtuoso upgrade. Well you say this. Probably, if we update de.dbpedia.org we will get issues that the missing rdf:langString makes queries fail ;)
I'm more referring to this particular issue here which I believe appears to be a bug in the Virtuoso query compiler: https://github.com/dbpedia/extraction-framework/issues/672
Funny - I just noticed this other report about langString/string issues just a bit down in the issue list: https://github.com/dbpedia/extraction-framework/issues/603
What do you think? Is this kind of problem one that the data providers should have to fix or should query results parsers such as the one in RDF4J be able to gracefully handle such problems with the data?
Neither, I think, that we need better debugging tools. e.g. the framework we describe in the paper has these kind of tests and they can be transferred well to other data. I would see the problem in:
I'm more referring to this particular issue here which I believe appears to be a but in the Virtuoso query compiler: #672
which is already being fixed (not sure about priority)
Well, as a person "in the middle" who is neither producing the data nor developing the RDF libraries, having (slightly) invalid data and strict RDF libraries essentially would lock me out from using the semantic resources. As such, considering that having perfect data is very hard and making RDF libraries more resilient is at least a realistic possibility - I think I'll continue to lobby for the latter - that doesn't mean that data and related tooling should not become better - but it means the data becomes more accessible while perfection is being worked towards ;)
refiled under hosting
Issue still valid?
Not sure what you want me to validate here. You can validate the issue using the "execute query" link below.
Source
Web / SPARQL
Here is a link to reproduce the issue: execute query.
Error Description
The query returns a literal tagged as a
langString
, but it does not include a language.This is invalid according to the RDF specs. (Ref: https://github.com/eclipse/rdf4j/issues/2815)
Error specification
I would guess that omitting the datatype or changing it to
string
should probably work.Expected / corrected RDF outcome snippet (NTRIPLES):
Additional context