Open mkroetzsch opened 9 years ago
I can help fix this bug (actually I reported this one). It would be great if you can pinpoint the java class needed to be modified.
There're also other bugs, where some triples contain "\110" or "\", etc, which would invoke some encoding issues.
Great, thanks. The issue is caused by this code.
I don't know what could cause the encoding issues, since we use OpenRDF for character encoding, and it should make sure that all string content is valid. Best open another bug for this if you have more information.
OK. I ll open another issue about character encoding later this evening.
The year-zero correction should be removed from our export, since it was converting XML Schema 1.1 to XML Schema 1.0 format. First of all, we want our exports to conform to XSD 1.1 (and thus to RDF 1.1). Secondly, it is currently unclear if the date encoding in the JSON is in XSD 1.1 or XSD 1.0 or in a mix of the two :-(. WMDE is working on clarifying how dates can be restored to follow a standard, but right now historic Wikidata dates should not be considered to be exact to the year, esp. in the BCE range.
Moreover, note that there is now issue #133 for tracking the character encoding issue.
When exporting a date like "4th century BC" (see https://www.wikidata.org/wiki/Q4318) leads to wrongly formatted XSD literals "-400" rather than "-0400". The reason for this is that the formatting method for negative years in TimeValueConverter.java is only invoked for dates with at least year precision (which is done since the formatting code right now is coupled to the year-zero correction code that is only meaningful for year-level precision. The fix for this will be to separate year-zero correction from negative-year formatting and have two if-statements instead.