BitFunnel / mg4j-workbench

Java tools for evaluating BitFunnel performance compared to an mg4j baseline.
GNU Lesser General Public License v3.0
1 stars 2 forks source link

Description of mgj4 index configuration #25

Open MikeHopcroft opened 7 years ago

MikeHopcroft commented 7 years ago

This issue tracks how we configure / intend to configure the mg4j index for maximum performance in the experiment.

  1. [Not implemented] Disable positions.
  2. Disable scoring.
  3. [Not verified]. Use BitStreamHPIndexReader. Actually, we probably want to use the subclass InMemoryHPIndex. Check to see if this is used by default. Right now it looks like the code uses QuasiSuccinctIndex.
  4. [Not verified]. Use in-memory index. See JavaDocs for Index.UriKeys.
  5. [Not verified]. Use wired index.
  6. No stemming.
  7. No stop word elimination.
  8. ??? Disable advanced queries (e.g. near, WAND, phrase).
  9. ??? Disable forward index storage for titles.
  10. ??? Disable forward index storage for BM25F scoring information.
  11. Exporter for Partitioned Elias-Fano index generates a frequency of 1 for every posting.
MikeHopcroft commented 7 years ago

The java docs for DiskBasedIndex says,

Note that quasi-succinct indices are memory-mapped by default, and for bitstream indices there is a limit of two gigabytes for in-memory indices.

This would seem to suggest that we are already running memory-mapped. Need to see if the 2GB limit applies to us.

Also, note that one can enable in-memory ormemory mapped behavior by appending ?inmemory=1 or ?mapped=1, respectively, to the base name uri parameter. This can be set on lines 73-74 of QueryLogRunner.java:


        text = Index.getInstance( basename + "-text?inmemory=1", true, true );
        title = Index.getInstance( basename + "-title?inmemory=1", true, true );~~~
MikeHopcroft commented 7 years ago

This link mentions the 2GB limit:

if you have more then 2GB of memory try to use java -Xmx2G

Contrast this with the -Xmx512M flags.