Java tools for evaluating BitFunnel performance compared to an mg4j baseline.
GNU Lesser General Public License v3.0
1
stars
2
forks
source link
utf-8 to utf-16 conversion in ChunkWordReader.next() is incorrect. #33
Open
MikeHopcroft opened 7 years ago
This code casts each byte to char, ignoring all multi-byte characters. The only reason this works is that the gov2 corpus is mostly ASCII.