dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
853 stars 269 forks source link

non-English dump - java.util.NoSuchElementException #428

Closed sandroacoelho closed 8 years ago

sandroacoelho commented 8 years ago

Hi guys, I'm trying to process a non-English dump, I am facing the following error:

WARNING: Error parsing title: found namespace 102/Namespace 102, expected 0/Main in title Anexo:Lista de rugbiers do Brasil
Exception in thread "main" java.util.NoSuchElementException: key not found: 2600
    at scala.collection.MapLike$class.default(MapLike.scala:228)
    at scala.collection.AbstractMap.default(Map.scala:58)
    at scala.collection.MapLike$class.apply(MapLike.scala:141)
    at scala.collection.AbstractMap.apply(Map.scala:58)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:219)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:187)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:145)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:116)
    at org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:65)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource.foreach(AllOccurrenceSource.scala:79)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
    at org.dbpedia.spotlight.io.FileOccurrenceSource$.writeToFile(FileOccurrenceSource.scala:55)
    at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia$.main(ExtractOccsFromWikipedia.scala:80)
    at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia.main(ExtractOccsFromWikipedia.scala)```
Is anything that I'm doing obviously wrong ?

Best,
jimkont commented 8 years ago

you can run generate-setting [1] to update the entries

[1] https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions#generate-settings