Closed MikeHopcroft closed 7 years ago
Commit ebfee71f41e28d333daf4964b29cc5fcaab2e42c removed commas as well. These aren't a problem for the query parser, but the would require escaping in the query performance results output file which is csv format.
Also removing '/' and coalescing multiple spaces into one.
The 2006 Trec Terabyte Topics contain the following characters that are illegal in mg4j (and BitFunnel) queries: '-', ';', ':', '\'', and '+'. Right now QueryLogRunner.LoadQueries() replaces each of these characters with a space:
We should preprocess these input files to remove these characters, and then update LoadQueries() to remove the regex code.