dkpro / dkpro-jwktl

Java Wiktionary Library
http://dkpro.org/dkpro-jwktl/
Apache License 2.0
57 stars 25 forks source link

NullPointerException when filtering by language #1

Closed chmeyer closed 9 years ago

chmeyer commented 9 years ago

Originally reported on Google Code with ID 1

What steps will reproduce the problem?
1. download 'dump dewiktionary-20130818-pages-articles.xml.bz2'
2. parse it with 'JWKTL.parseWiktionaryDump(dumpFile, outputDirectory, true);'
3. run the snippet you gave as an example on http://code.google.com/p/jwktl/wiki/JWKTLUseCases

  IWiktionaryEdition wkt = JWKTL.openEdition(WIKTIONARY_DIRECTORY);
  WiktionaryEntryFilter filter = new WiktionaryEntryFilter();
  filter.setAllowedWordLanguages(Language.GERMAN);
  filter.setAllowedPartsOfSpeech(PartOfSpeech.ADJECTIVE);
  int deAdjectiveCount = 0;
  for (IWiktionaryEntry entry : wkt.getAllEntries(filter)) {
    System.out.println(entry.getWord());
    deAdjectiveCount++;
  }
  System.out.println("German adjectives: " + deAdjectiveCount);
  wkt.close();

What is the expected output? What do you see instead?

What I expect is that all german adjectives are printed and their count also.
But what I see is a NullPointerException. Here is the stack trace:

Exception in thread "main" java.lang.NullPointerException
    at java.util.TreeMap.getEntry(TreeMap.java:342)
    at java.util.TreeMap.containsKey(TreeMap.java:227)
    at java.util.TreeSet.contains(TreeSet.java:234)
    at de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter.acceptWordLanguage(WiktionaryEntryFilter.java:103)
    at de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter.accept(WiktionaryEntryFilter.java:149)
    at de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition$1.fetchNext(WiktionaryEdition.java:97)
    at de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition$1.fetchNext(WiktionaryEdition.java:81)
    at de.tudarmstadt.ukp.jwktl.api.util.WiktionaryIterator.hasNext(WiktionaryIterator.java:43)
    at WiktionarySearcher.main(WiktionarySearcher.java:24)

What version of the product are you using? On what operating system?

I am using version version 1.0.0 on Ubuntu 

Best

Abou

Reported by abdoulaye.drame@neofonie.de on 2013-08-21 15:06:15

chmeyer commented 9 years ago
That's an error, indeed. Here's a quick fix for version 1.0.0:

{{{
IWiktionaryEdition wkt = JWKTL.openEdition(WIKTIONARY_DIRECTORY);
WiktionaryEntryFilter filter = new WiktionaryEntryFilter(){
  @Override
  protected boolean acceptWordLanguage(final IWiktionaryEntry entry) {
    if (entry.getWordLanguage() == null)
      return false;

    return super.acceptWordLanguage(entry);
  }
  @Override
  protected boolean acceptPartOfSpeech(final IWiktionaryEntry entry) {
    if (entry.getPartOfSpeech() == null)
      return false;

    return super.acceptPartOfSpeech(entry);
  }
};
filter.setAllowedWordLanguages(Language.GERMAN);
filter.setAllowedPartsOfSpeech(PartOfSpeech.ADJECTIVE);
int deAdjectiveCount = 0;
for (IWiktionaryEntry entry : wkt.getAllEntries(filter)) {
  System.out.println(entry.getWord());
  deAdjectiveCount++;
}
System.out.println("German adjectives: " + deAdjectiveCount);
wkt.close();
}}}

Proper fix is available for the current head revision.

Reported by chmeyer.de on 2013-08-21 15:20:45

chmeyer commented 9 years ago
Possible hint is that some entries contain no information  about the word language,
that is to say, "entry.getWordLanguage().equals(null)" and this leads to an exception
when trying to filter.

Reported by abdoulaye.drame@neofonie.de on 2013-08-21 15:21:06

chmeyer commented 9 years ago
Thx for the quick reply.
Checking out the fix will also be great, but it seems that one dependency cannot be
found, when I try to build the artifact using 'mvn install'. This concerns the following
one:

xerces:xercesImpl:jar:2.9.1-lucene

Missing:
----------
1) xerces:xercesImpl:jar:2.9.1-lucene

  Try downloading the file manually from the project website.

Reported by abdoulaye.drame@neofonie.de on 2013-08-21 15:25:29

chmeyer commented 9 years ago
Could you please open separate issues for different problems? Xerces should come directly
via Maven or is available from http://xerces.apache.org/xerces2-j/.

Reported by chmeyer.de on 2013-08-21 15:34:27

chmeyer commented 9 years ago

Reported by chmeyer.de on 2013-08-21 15:38:26

chmeyer commented 9 years ago
Oh soory I missed that. I will open separate issues for different problems from now
on.

Reported by abdoulaye.drame@neofonie.de on 2013-08-21 15:40:46