dhfbk / tint

The Italian NLP Tool
http://tint.fbk.eu
GNU General Public License v3.0
70 stars 9 forks source link

ArrayIndexOutOfBound in GuessModel #33

Closed mastrofinimanuel closed 3 years ago

mastrofinimanuel commented 4 years ago

Hi, this excerpt of code can cause the exception in the title:


for (int i = 0; i < min; i++) {
                    char charForm = form.charAt(i);
                    char charLemma = lemma.charAt(i);
                    if (charForm != charLemma || i == min - 1) {
                        String postfix = lemma.substring(i);
                        int length = token.length() - form.length();
                        /// ADD A CHECK HERE
                        String prefix = token.substring(0, i + length);
                        ///////////
                        guessed_lemma = prefix + postfix;
                        break;
                    }
                }

``
Example of text causing the issue: 

> 10 pezzi 16Ga Needles. 10 x 1 ml siringa + 10 pezzi 16Ga aghi

You may want to check if the substring is doable on token.
ziorufus commented 4 years ago

Thank you for reporting the issue, we'll check as soon as possible.

mastrofinimanuel commented 4 years ago

Let me add a new example causing the issue: "Dischetti levatrucco make up MAREB in cotone idrofilo 100PZ  Dischetti per togliete il trucco Materiale 100% cotone idrofilo Quantità 100 dischetti Codice:210140."

ziorufus commented 3 years ago

Fixed in the new version of Tint.