dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
853 stars 269 forks source link

Invalid RDF in changesets #585

Open kurzum opened 5 years ago

kurzum commented 5 years ago

Running rapper over the changesets at http://downloads.dbpedia.org/live/changesets/2019/ Rapper log: http://95.217.42.166/rapper-changesets-2019.bz2 find changesets/2019 | grep 'nt.gz$' | xargs zcat | rapper -i ntriples -c - http://base.org 2>&1 | lbzip2 -zc > /var/www/live.dbpedia.org/rapper-changesets-2019.bz2

the following issues occurred:

rapper: Error - URI http://base.org:939898 column 48 - URI 'http://en.wikipedia.org/wiki/New_Years_Eve_7"' contains bad character(s)
rapper: Error - URI http://base.org:940138 column 175 - URI 'http://en.wikipedia.org/w/index.php?title=New_Years_Eve_7"&action=history' contains bad character(s)
m1ci commented 4 years ago

@Vehnem construct validator?