In lm module, when reading a big ngram which count is more than Integer.MAX_VALUE/blockSize, it will exceed the length of int, and crash happened.
From the code, in constructer of GramDataArray:
while (l < count * blockSize) {
pageCounter++;
l += (pageLength * blockSize);
}
data = new byte[pageCounter][];
int total = 0;
for (int i = 0; i < pageCounter; i++) {
if (i < pageCounter - 1) {
data[i] = new byte[pageLength * blockSize];
total += pageLength * blockSize;
} else {
data[i] = new byte[count * blockSize - total];
}
dis.readFully(data[i]);
}
would be corrected to:
while (l < (long)count * blockSize) {
pageCounter++;
l += (pageLength * blockSize);
}
data = new byte[pageCounter][];
long total = 0;
for (int i = 0; i < pageCounter; i++) {
if (i < pageCounter - 1) {
data[i] = new byte[pageLength * blockSize];
total += pageLength * blockSize;
} else {
data[i] = new byte[(int)((long)count * blockSize - total)];
}
dis.readFully(data[i]);
}
In lm module, when reading a big ngram which count is more than Integer.MAX_VALUE/blockSize, it will exceed the length of int, and crash happened.
From the code, in constructer of GramDataArray:
would be corrected to:
Please check.
Br Bojie