apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.66k stars 1.03k forks source link

Create benchmark & approach for testing Lucene's near real-time performance [LUCENE-2061] #3137

Open asfimport opened 14 years ago

asfimport commented 14 years ago

With the improvements to contrib/benchmark in #3126, it's now possible to create compelling algs to test indexing & searching throughput against a periodically reopened near-real-time reader from the IndexWriter.

Coming out of the discussions in #2600, I think to properly characterize NRT, we should measure net search throughput as a function of both reopen rate (ie how often you get a new NRT reader from the writer) and indexing rate. We should also separately measure pure adds vs updates (deletes + adds); the latter is much more work for Lucene.

This can help apps make capacity decisions... and can help us test performance of pending improvements for NRT (eg #2390, LUCENE-2047).


Migrated from LUCENE-2061 by Michael McCandless (@mikemccand), updated Nov 29 2009 Attachments: LUCENE-2061.patch (versions: 3)

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Attached first cut python script nrtBench.py.

You have to edit the constants up top, to point to both Wiki XML export and a Wiki line file. It use the XML export to build up the base index, and then the line file to do the "live" indexing.

It first runs a baseline, redline searching with 9 (default) threads, and reports the net qps. (You'll have to write a queries.txt w/ the queries to test). Then it steps through NRT reopen rates of every 0.1, 1.0, 2.5, 5.0 seconds X indexing rate of 1, 10, 100, 1000 per sec (using 2 indexing threads), and then redlines the search threads, comparing their search throughput to the baseline.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

New nrtBench.py attached, fixed a few small issues... also, I removed -Xbatch to java; it seems to make less consistent results.

My initial results:

JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

Baseline QPS 158.12

Indexing docs/sec NRT reopen period (sec) QPS add QPS update QPS add (% diff) QPS update (% diff)
1 1 157.5 125.7 -0.4% -20.5%
1 2.5 157.6 127.5 -0.4% -19.4%
1 5 156.9 127.2 -0.8% -19.5%
10 0.1 156.3 142.4 -1.2% -9.9%
10 0.5 155.8 125.0 -1.5% -20.9%
10 1 156.0 142.6 -1.3% -9.8%
10 2.5 156.6 143.4 -0.9% -9.3%
10 5 156.2 144.0 -1.2% -8.9%
100 0.1 153.9 138.8 -2.7% -12.2%
100 0.5 155.0 141.1 -2.0% -10.8%
100 1 156.1 141.3 -1.3% -10.6%
100 2.5 155.9 116.7 -1.4% -26.2%
100 5 157.0 143.8 -0.7% -9.1%
1000 0.1 145.9 110.0 -7.7% -30.4%
1000 0.5 148.0 117.6 -6.4% -25.6%
1000 1 148.3 97.7 -6.2% -38.2%
1000 2.5 149.3 99.1 -5.6% -37.3%
1000 5 147.4 124.3 -6.8% -21.4%

The docs are \~1KB sized docs derived from wikipedia. The searching is only running a single fixed query (1), over and over.

Some rough observations:

Looks like we have some puzzles to solve...

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK the last test had a silly bug, that made update QPS slowdown even @ low indexing & reopen rates look worse than it should be...

The test ran on a fully optimized index, ie, it had no deletions.

So the pure searching & add tests had no deletedDocs vector to check, but the update test, after the very first doc was indexed, had to check the deleteDocs. So, really, that 20% slowdown we see right off the bat for the updates case is the added cost of having to check the BitVector.

So the test was unfair. I'll re-run after deleting one doc from the base index...

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I modified nrtBench.py to take advantage of some of the features in

3155:

I made some other small changes, eg changed -report to create a separate 'add only' vs 'delete + add' table.

Finally, I switched to a non-optimized 5M Wikpedia index (12 segments), with 1% deletions. I think this is a more typical index that an app would have after running NRT for a while.

New results:

JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

Baseline QPS 144.24

Add only:

Docs/sec Reopen every (sec) Reopen mean (ms) Reopen stddev(ms) QPS % diff
10.0 0.1 0.0 1.0 132.11 -8.4%
10.0 1.0 3.0 0.0 132.79 -7.9%
10.0 5.0 9.0 2.0 121.31 -15.9%
10.0 10.0 14.0 2.0 134.7 -6.6%
10.0 33.3 30.0 3.7 133.57 -7.4%
100.0 0.1 2.0 0.0 142.02 -1.5%
100.0 1.0 12.0 1.4 125.9 -12.7%
100.0 5.0 41.0 2.8 105.46 -26.9%
100.0 10.0 61.0 4.2 126.09 -12.6%
100.0 33.3 128.0 5.8 141.46 -1.9%
1000.0 0.1 15.0 168.8 102.14 -29.2%
1000.0 1.0 62.0 5.1 117.06 -18.8%
1000.0 5.0 192.0 7.4 123.7 -14.2%
1000.0 10.0 166.0 10.3 97.57 -32.4%
1000.0 33.3 162.0 12.1 101.52 -29.6%

Delete + add:

Docs/sec Reopen every (sec) Reopen mean (ms) Reopen stddev(ms) QPS % diff
10.0 0.1 1.0 1.7 132.82 -7.9%
10.0 1.0 6.0 1.0 134.57 -6.7%
10.0 5.0 21.0 8.8 119.37 -17.2%
10.0 10.0 38.0 17.4 129.19 -10.4%
10.0 33.3 82.0 11.1 135.14 -6.3%
100.0 0.1 6.0 1.0 127.01 -11.9%
100.0 1.0 34.0 6.8 141.1 -2.2%
100.0 5.0 126.0 17.9 105.43 -26.9%
100.0 10.0 203.0 29.3 117.16 -18.8%
100.0 33.3 538.0 77.5 132.26 -8.3%
1000.0 0.1 45.0 187.8 96.84 -32.9%
998.9 1.0 246.0 41.0 95.32 -33.9%
996.6 5.0 941.0 154.4 102.17 -29.2%
999.5 10.0 1680.0 549.1 90.69 -37.1%
990.2 33.3 4587.0 2660.9 90.89 -37.0%

Observations:

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I was baffled by why I see such sporadic QPS differences for reopen rates, so I ran another test, this time always flushing after 100 buffered docs:

JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

Baseline QPS 146.74

Add only:

Docs/sec Reopen every (sec) Reopen mean (ms) Reopen stddev(ms) QPS % diff
100.0 0.1 2.0 1.4 143.7 -2.1%
100.0 1.0 5.0 7.5 145.1 -1.1%
100.0 5.0 6.0 4.8 144.72 -1.4%
100.0 10.0 9.0 8.5 143.95 -1.9%
100.0 33.3 11.0 11.1 143.12 -2.5%

Baseline QPS 146.3

Delete + add:

Docs/sec Reopen every (sec) Reopen mean (ms) Reopen stddev(ms) QPS % diff
100.0 0.1 6.0 2.2 143.15 -2.2%
100.0 1.0 28.0 10.1 133.78 -8.6%
100.0 5.0 77.0 29.9 143.28 -2.1%
100.0 10.0 92.0 49.5 142.63 -2.5%
100.0 33.3 91.0 47.4 143.57 -1.9%

Very strangely, by flushing every 100 docs, ie once per second even if you're reopening at a slower rate, the QPS is much more reasonable: pretty much unaffected by the ongoing indexing, either adding or delete + adding. I don't know how to explain this....

Also, note that reopen times are still longer for delete+add. This is because the deletes are still only being resolved when it's time to reopen (or, time to merge), not after every 100 docs. This also explains why going from reopen sec 10 -> 30 didn't see any change in the reopen time: after 10 seconds (= 10 new segments), a merge kicks off, which always resolves the deletes.

So I think this is good news, in that it brings QPS back up to nearly the baseline, but bad news in that, I have no idea why...

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Just attaching latest nrtBench.py...

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

BTW, based on these last results I posted here, the rough conclusion seems to be that so long as you set up IW to flush every N docs (which I still don't understand why it's necessary) the ongoing indexing & reopening does not hurt QPS substantially when compared to the "pure searching" baseline.

This is an important result. It means all the other optimizations we're pursuing for NRT are not really necessary. (At least on the env I tested). I think it must be that the OS is quite efficient at creating smallish files and turning around these files for reading (ie its file write cache is "effectively" emulating a RAMDirectory).

asfimport commented 14 years ago

Jason Rutherglen (migrated from JIRA)

Mike, I tried running nrtBench.py, generated a 2 million doc index as I didn't want to wait for the 5 mil to finish.

Can you post the queries file you've used? (the nrtBench was looking for it) I'd like to keep things as consistent as possible between runs.

I haven't seen the same results in regards to the OS managing small files, and I suspect that users in general will choose a variety of parameters (i.e. 1 max buffered doc) that makes writing to disk inherently slow. Logically the OS should work as a write cache, however in practice, it seems a variety of users have reported otherwise. Maybe 100 docs works, however that feels like a fairly narrow guideline for user's of NRT.

The latest #2390 is a step in a direction that doesn't change IW internals too much.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Can you post the queries file you've used?

I only used TermQuery "1", sorting by score. I'd generally like to focus on worst case query latency rather than QPS of "easy" queries. Maybe we should switch to harder queries (phrase, boolean).

Though one thing I haven't yet focused on testing (which your work on #2859 would improve) is queries that hit the FieldCache – we should test that as well.

I haven't seen the same results in regards to the OS managing small files, and I suspect that users in general will choose a variety of parameters (i.e. 1 max buffered doc) that makes writing to disk inherently slow. Logically the OS should work as a write cache, however in practice, it seems a variety of users have reported otherwise. Maybe 100 docs works, however that feels like a fairly narrow guideline for user's of NRT.

Yeah we need to explore this (when OS doesn't do effective write-caching), in practice.

The latest #2390 is a step in a direction that doesn't change IW internals too much.

I do like this simplification – basically IW is internally managing how best to use RAM in NRT mode – but I think we need to scrutinize (through benchmarking, here) whether this is really needed (ie, whether we can't simply rely on the OS to behave, with its IO cache).