apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.03k forks source link

Some small fixes to contrib/benchmark [LUCENE-1115] #2192

Closed asfimport closed 16 years ago

asfimport commented 16 years ago

I've fixed a few small issues I've hit in contrib/benchmark.

First, this alg was only doing work on the first round. All subsequent rounds immediately finished:

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
work.dir = /lucene/work
docs.file=work/reuters.lines.txt
doc.maker.forever=false
directory=FSDirectory
doc.add.log.step=3000

{ "Rounds"
  ResetSystemErase
  CreateIndex
  { "AddDocs" AddDoc > : *
  CloseIndex
  NewRound
} : 3

I think this is because we are failing to reset "exhausted" to false in PerfTask.doLogic(), so I added that. Plus I had to re-open the file in LineDocMaker.

Second, I made a small optimization to not call updateExhausted unless any of the child tasks are TaskSequence or ResetInputsTask (which I compute up-front).

Finally, we were not allowing flushing by RAM and doc count, so I fixed the logic in Create/OpenIndexTask to set both RAMBufferSizeMB and MaxBufferedDocs.


Migrated from LUCENE-1115 by Michael McCandless (@mikemccand), resolved Jan 03 2008 Attachments: LUCENE-1115.patch

asfimport commented 16 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Attached patch. All tests pass. I plan to commit in a day or so.

asfimport commented 16 years ago

Doron Cohen (migrated from JIRA)

Definitely a bug. Patch looks good, and I like the optimization, thanks for fixing this Mike.

Perhaps rename in TaskSequence from anyExhaustedTasks to anyExhaustableTasks?

Also, this new test (belongs in TestPerfTaskLogic) passes with the fix but fails without it:

  /**
   * Test that exhaust in loop works as expected (LUCENE-1115).
   */
  public void testExhaustedLooped() throws Exception {
    // 1. alg definition (required in every "logic" test)
    String algLines[] = {
        "# ----- properties ",
        "doc.maker="+Reuters20DocMaker.class.getName(),
        "doc.add.log.step=3",
        "doc.term.vector=false",
        "doc.maker.forever=false",
        "directory=RAMDirectory",
        "doc.stored=false",
        "doc.tokenized=false",
        "debug.level=1",
        "# ----- alg ",
        "{ \"Rounds\"",
        "  ResetSystemErase",
        "  CreateIndex",
        "  { \"AddDocs\"  AddDoc > : * ",
        "  CloseIndex",
        "} : 2",
    };

    // 2. execute the algorithm  (required in every "logic" test)
    Benchmark benchmark = execBenchmark(algLines);

    // 3. test number of docs in the index
    IndexReader ir = IndexReader.open(benchmark.getRunData().getDirectory());
    int ndocsExpected = 20; // Reuters20DocMaker exhausts after 20 docs.
    assertEquals("wrong number of docs in the index!", ndocsExpected, ir.numDocs());
    ir.close();
  }

Cheers, Doron

asfimport commented 16 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Awesome, I will add that test case. Thanks Doron!