arty-name / hunspell-merge

Software for merging several hunspell dictionaries. Former location:
https://code.google.com/p/hunspell-merge/
GNU Lesser General Public License v3.0
28 stars 4 forks source link

Exception when parsing Mozilla italian dictionary file #9

Open lorenzos opened 3 years ago

lorenzos commented 3 years ago

Mozilla's Italian dictionary file contains some "comments" on top (line starting with /) which cause a parsing error when merging dictionaries:

java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
    at hunspell.merge.DicReader.readLine(DicReader.java:32)
    at hunspell.merge.FileReader.readFile(FileReader.java:29)
    at hunspell.merge.DictionaryFile.readFiles(DictionaryFile.java:113)
    at hunspell.merge.HunspellMerge.createDictionariesImpl(HunspellMerge.java:339)
    at hunspell.merge.HunspellMerge.access$1500(HunspellMerge.java:25)
    at hunspell.merge.HunspellMerge$9$1.run(HunspellMerge.java:323)
    at java.base/java.lang.Thread.run(Thread.java:829)

I think the issue is that hunspell-merge searches for lines in the form word/FLAGS and expects the word part to always be non-empty. Here are the top lines of the Italian file that causes the exception:

95421
/ "Dizionario italiano" add-on for Mozilla products.
/
/ Forked from: "Estensione linguistica italiana - Italian Writing Aids
/ extension" version 5.1, see README.txt for more details.
/
/ Copyright (C) 2001, 2002 Gianluca Turconi
/ [...]
/
/ You should have received a copy of the GNU General Public
/ License along with the "Estensione linguistica italiana - Italian
/ Writing Aids extension"; if not, see <http://www.gnu.org/licenses/>.
a
ab
abaco/OTqr
Abacuc
abadessa/QTUqrs

I think hunspell-merge should be able to simply ignore these.

arty-name commented 3 years ago

Thank you for the report! Unfortunately I am unable to help.

This was created by other people and I have used it years ago. When the Google Code was closed, I preserved the original repository by migrating it to GitHub.

I am incapable of maintaining this code, so you’d have to address any issues by yourself or to find a Java developer for that.

I’ve added this information to the ReadMe and will now archive this repository to avoid confusion in the future.