Closed chile12 closed 8 years ago
Could you provide a few more details? Or maybe post your extraction.xyz.properties file. Which extractors did you run? An .nt or .ttl file (or a part of one) would also be very helpful. Do these errors occur only in the .nt files or also in the .ttl files?
The double backslash in the error message is rather strange. Looks like the URI has been backslash-escaped twice.
my properties file: base-dir=C:/Users/Chile/Desktop/testDumps
source=pages-articles.xml.bz2
languages=de
extractors=.MappingExtractor, .InfoboxExtractor
format.nt.bz2=n-triples;uri-policy.default
ontology=../ontology.xml mappings=../mappings
I can only report on .nt files, where uris look like this:
There's nothing wrong with http://de.dbpedia.org/resource/\u03A9-Bromacetophenon - it's the NT-escaped version of the IRI http://de.dbpedia.org/resource/Ω-Bromacetophenon .
Until recently, NT didn't allow non-ASCII chars, they had to be escaped. See #291 for details.
Please post excerpts from your NT files, especially a few full lines where "BAD URI" occurs. For example, I'd like to know whether they occur in subject or object position.
Also, please post your full properties file. The definition of uri-policy.default is missing in the excerpt above...
Which version of the code do you use? Latest master branch from github?
fixed in latest version
The extraction of the de-dump results in a lot of these: BAD URI: Illegal character in path at index 43: http://de.dbpedia.org/resource/Friedel_Tiek\\u00F6tter... Please have a look into this.