Closed asfimport closed 13 years ago
Robert Muir (@rmuir) (migrated from JIRA)
Here's the most important benchmark: speeding up the MultiMMap's readByte(s) in general:
MultiMMapIndexInput readByte(s) improvements [trunk, Standard codec]
Query | QPS trunk | QPS patch | Pct diff |
---|---|---|---|
spanFirst(unit, 5) | 12.72 | 12.85 | 1.0% |
+nebraska +state | 137.47 | 139.33 | 1.3% |
spanNear([unit, state], 10, true) | 2.90 | 2.94 | 1.4% |
"unit state" | 5.88 | 5.99 | 1.8% |
unit\~2.0 | 7.06 | 7.20 | 2.0% |
+unit +state | 8.68 | 8.87 | 2.2% |
unit state | 8.00 | 8.23 | 2.9% |
unit\~1.0 | 7.19 | 7.41 | 3.0% |
unit* | 22.66 | 23.41 | 3.3% |
uni* | 12.54 | 13.12 | 4.6% |
united\~1.0 | 10.61 | 11.12 | 4.8% |
united\~2.0 | 2.52 | 2.65 | 5.1% |
state | 28.72 | 30.23 | 5.3% |
un*d | 44.84 | 48.06 | 7.2% |
u*d | 13.17 | 14.51 | 10.2% |
In the bulk postings branch, I've been experimenting with various techniques for FOR/PFOR and one thing i tried was simply decoding with readInt() from the DataInput. So I adapted For/PFOR to just take DataInput and work on it directly, instead of reading into a byte[], wrapping it with a ByteBuffer, and working on an IntBuffer view.
But when I did this, i found that MMap was slow for readInt(), etc. So we implement these primitives with ByteBuffer.readInt(). This isn't very important since lucene doesn't much use these, and mostly theoretical but I still think things like readInt(), readShort(), readLong() should be fast... for example just earlier today someone posted an alternative PFOR implementation on #2484 that uses DataInput.readInt().
MMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
Query | QPS branch | QPS patch | Pct diff |
---|---|---|---|
spanFirst(unit, 5) | 12.14 | 11.99 | -1.2% |
united\~1.0 | 11.32 | 11.33 | 0.1% |
united\~2.0 | 2.51 | 2.56 | 2.1% |
unit\~1.0 | 6.98 | 7.19 | 3.0% |
unit\~2.0 | 6.88 | 7.11 | 3.3% |
spanNear([unit, state], 10, true) | 2.81 | 2.96 | 5.2% |
unit state | 8.04 | 8.59 | 6.8% |
+unit +state | 10.97 | 12.12 | 10.5% |
unit* | 26.67 | 29.80 | 11.7% |
"unit state" | 5.59 | 6.27 | 12.3% |
uni* | 15.10 | 17.51 | 15.9% |
state | 33.20 | 38.72 | 16.6% |
+nebraska +state | 59.17 | 71.45 | 20.8% |
un*d | 35.98 | 47.14 | 31.0% |
u*d | 9.48 | 12.46 | 31.4% |
Here's the same benchmark of DataInput.readInt() but with the MultiMMapIndexInput
MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
Query | QPS branch | QPS patch | Pct diff |
---|---|---|---|
united\~2.0 | 2.43 | 2.54 | 4.3% |
united\~1.0 | 10.78 | 11.39 | 5.7% |
unit\~1.0 | 6.81 | 7.21 | 5.8% |
unit\~2.0 | 6.62 | 7.05 | 6.5% |
spanNear([unit, state], 10, true) | 2.77 | 2.96 | 6.6% |
unit state | 7.85 | 8.53 | 8.7% |
spanFirst(unit, 5) | 10.50 | 11.71 | 11.5% |
+unit +state | 10.26 | 11.94 | 16.3% |
"unit state" | 5.39 | 6.31 | 17.0% |
state | 31.95 | 39.17 | 22.6% |
unit* | 24.39 | 31.02 | 27.2% |
+nebraska +state | 54.68 | 71.98 | 31.6% |
u*d | 9.53 | 12.62 | 32.5% |
uni* | 13.72 | 18.23 | 32.9% |
un*d | 35.87 | 48.19 | 34.3% |
Just to be sure, I ran this last one on sparc64 (bigendian) also.
MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
Query | QPS branch | QPS patch | Pct diff |
---|---|---|---|
united\~2.0 | 2.23 | 2.26 | 1.5% |
unit\~2.0 | 6.37 | 6.47 | 1.6% |
united\~1.0 | 11.33 | 11.59 | 2.3% |
unit\~1.0 | 9.68 | 10.05 | 3.7% |
spanNear([unit, state], 10, true) | 15.60 | 17.54 | 12.5% |
unit* | 127.14 | 144.08 | 13.3% |
unit state | 44.93 | 51.30 | 14.2% |
spanFirst(unit, 5) | 58.42 | 68.37 | 17.0% |
uni* | 56.66 | 67.53 | 19.2% |
+nebraska +state | 215.62 | 262.99 | 22.0% |
+unit +state | 63.18 | 77.86 | 23.2% |
"unit state" | 32.24 | 40.05 | 24.2% |
u*d | 29.13 | 36.69 | 26.0% |
state | 145.99 | 188.33 | 29.0% |
un*d | 65.27 | 87.20 | 33.6% |
I think some of these benchmarks also show that MultiMMapIndexInput might now be essentially just as fast as MMapIndexInput... but lets not go there yet and keep them separate for now.
Simon Willnauer (@s1monw) (migrated from JIRA)
Awesome results robert!! :)
Uwe Schindler (@uschindler) (migrated from JIRA)
Awesome, Ro bert is changing to the MMap Performance Policeman!
I like the idea to simply delegate the methods and catch exception to fallback to manual read with boundary transition! I just wanted to be sure that the position pointer in the buffer does not partly go forward when you read request fails at a buffer boundary, but that seems to be the case.
Uwe Schindler (@uschindler) (migrated from JIRA)
One thing to add: When using readFloat & co, we should make sure that we set the endianness explicitely in the ctor. I just want to explicitely make sure that the endianness is correct and document it that it is big endian for Lucene.
We don't need that: "The initial order of a byte buffer is always BIG_ENDIAN."
Robert Muir (@rmuir) (migrated from JIRA)
I just wanted to be sure that the position pointer in the buffer does not partly go forward when you read request fails at a buffer boundary, but that seems to be the case.
Yes, this is guaranteed in the APIs, and also tested well by TestMultiMMap, which uses random chunk sizes between 20 and 100 (including odd numbers etc) Though we should enhance this test, i think it just retrieves documents at the moment... probably better if it did some searches too.
Michael McCandless (@mikemccand) (migrated from JIRA)
Good grief! What amazing gains, especially w/ PFor codec which of course makes super heavy use of .readInt(). Awesome Robert!
This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely have a huge edge over NIOFSDir?
Robert Muir (@rmuir) (migrated from JIRA)
Good grief! What amazing gains, especially w/ PFor codec which of course makes super heavy use of .readInt(). Awesome Robert! This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely have a huge edge over NIOFSDir?
This isn't really a 'gain' for the bulkpostings branch? This is just making DataInput.readInt() faster. Currently the bulkpostings branch uses readByte(byte[]), then wraps into a ByteBuffer and processes an IntBuffer view of that. I switched to just using readInt() from DataInputDirectly [FrameOfRefDataInput] and found it to be much slower than this IntBuffer method.
this whole benchmark is just benching DataInput.readInt()...
So, we shouldn't change anything in bulkpostings, this isn't faster than the intbuffer method in my tests, at best its equivalent... but we should fix this slowdown in our APIs.
Robert Muir (@rmuir) (migrated from JIRA)
committed revision 1050737. I'll wait a bit for branch_3x.
Robert Muir (@rmuir) (migrated from JIRA)
Committed revision 1052892 to branch_3x.
MMapDirectory has some performance problems:
Migrated from LUCENE-2816 by Robert Muir (@rmuir), resolved Dec 26 2010 Attachments: LUCENE-2816.patch