demidko / aot

Russian morphology analyzer for Java | Морфологический словарь русского языка для Java
https://jitpack.io/#demidko/aot
MIT License
55 stars 6 forks source link

ArrayIndexOutOfBoundsException on some Russian words #2

Closed orchestr7 closed 1 year ago

orchestr7 commented 1 year ago
// kotlin  
println(lookupForMeanings("ежи")[0].morphology)

causes:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 25 out of bounds for length 24
    at com.github.demidko.aot.HashDictionary.getFlexionTags(HashDictionary.java:116)
    at com.github.demidko.aot.WordformMeaning.getMorphology(WordformMeaning.java:67)
    at com.github.demidko.aot.MainKt.main(Main.kt:9)
    at com.github.demidko.aot.MainKt.main(Main.kt)

happens because of invalid indexing:

  public MorphologyTag[] getFlexionTags(int lemmaId, int flexionIndex) {
    return allMorphologyTags[lemmas[lemmaId][flexionIndex * 2 + 1]]; // <--
  }
orchestr7 commented 1 year ago

There are 1274956 examples of problematic words in the vocabulary :

    val a = mutableSetOf<String>()
    getDictionary().allFlexionStrings.asSequence().forEach {
        try {
            lookupForMeanings(it)[0].morphology
        } catch (e: ArrayIndexOutOfBoundsException) {
            a.add(it)
        }
    }
    println(a.size)
demidko commented 1 year ago

@akuleshov7 Thank you for reporting this! I fix it.

demidko commented 1 year ago

@akuleshov7 I checked it in the relevant branch, here's what came up:

  @Test
  void testSomeRussianWords() throws IOException {
    System.out.println(
      lookupForMeanings("ежи").get(0).getMorphology()
    );
  }

Result:

[С, мр, им, мн]
demidko commented 1 year ago

@akuleshov7 Make sure you are using the latest version of the library.

dependencies {
  implementation("com.github.demidko:aot:2022.08.06")
}
orchestr7 commented 1 year ago

yeah, the bug comes from an older version, thx

orchestr7 commented 1 year ago

In newer version there is even no method WordformMeaning.getDictionary() to get all the words

demidko commented 1 year ago

@akuleshov7 I have added two new methods: WordformMeaning::getAllFlexions and WordformMeaning::listAllWordforms.
You need to include an even newer version of the library:

dependencies {
  implementation("com.github.demidko:aot:2022.11.16")
}