BitFunnel / mg4j-workbench

Java tools for evaluating BitFunnel performance compared to an mg4j baseline.
GNU Lesser General Public License v3.0
1 stars 2 forks source link

Crash in ChunkDocument.tryParseStream while attempting to read filtered version of GX229 #29

Open MikeHopcroft opened 7 years ago

MikeHopcroft commented 7 years ago

This crash happens in ChunkDocument.tryParseStream() when buffer[writeCursor++] = (byte)c; attempts to write past the end of buffer. The size of buffer was 256k, based on the assertion that gov2 documents are truncated at 256KB.

The document that causes the crash has length 357895. This document was encountered while processing GX229-1000-1500.chunk, which was a version of GX229.chunk that was filtered by BitFunnel to contain documents with unique posting counts from 1000 to 1500.

Some observations:

My leading theory is that the original gov2 GX229 directory contains a bundle (.txt file) with a document, which tikka represents as longer than 256k.

MikeHopcroft commented 7 years ago

I have no evidence that this crash is related to BitFunnel issue 387, but I mention it here just in case.