apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.61k stars 1.02k forks source link

Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

Closed ChrisHegarty closed 4 months ago

ChrisHegarty commented 4 months ago

Add a MemorySegment Vector scorer - for scoring without copying on-heap.

The vector scorer loads values directly from the backing memory segment when available. Otherwise, if the vector data spans across segments, or is a query vector, the scorer copies the vector data on-heap.

A benchmark shows ~2x performance improvement of this scorer over the default copy-on-heap scorer. The benchmark need a little more scrutiny and evaluation on different platforms. Here's the results on a Max M2:

Benchmark                                      (size)   Mode  Cnt  Score   Error   Units
VectorScorerBenchmark.binaryDotProductDefault    1024  thrpt    5  1.391 ± 0.016  ops/us
VectorScorerBenchmark.binaryDotProductMemSeg     1024  thrpt    5  3.013 ± 0.207  ops/us

The scorer currently only operates on vectors with an element size of byte, since loading vector data from float[] (the fallback), is only supported in JDK 22. We can evaluate if and how to support floats separately. See https://bugs.openjdk.org/browse/JDK-8318678

The vector scorer is implicitly tied to the Panama Vector Utils implementation - you can only have a Memory segment scorer if the Panama vector implementation is present. There is a little room for improvement in how these things are initialised and structured.

msokolov commented 4 months ago

So excited to see this finally come to fruition! No more double-buffering!

uschindler commented 4 months ago

How can i change the review to "undecided"?

ChrisHegarty commented 4 months ago

How can i change the review to "undecided"?

I re-requested ur review - so there is no official reviewer yet. Take ur time. I have some luceneutil benchmarks to run, etc.

uschindler commented 4 months ago

How can i change the review to "undecided"?

I re-requested ur review - so there is no official reviewer yet. Take ur time. I have some luceneutil benchmarks to run, etc.

The benchmark code seems broken after your changes.

ChrisHegarty commented 4 months ago

Dismissing Uwe's review, since he is undecided. Can be explicitly added later, when we convince him ;-)

(Oh, this looks harsh!) I hope that I did this right. If not, I apologise. No offence intended. I just want to reflect @uschindler's comment above about being currently undecided.

uschindler commented 4 months ago

Still working on reviewing, busy busy busy and on vacation starting from tomorrow!

ChrisHegarty commented 4 months ago

Still working on reviewing, busy busy busy and on vacation starting from tomorrow!

I appreciate your ongoing review. Pause your review for now, enjoy your vacation, and I'll not merge before you return and have had time to finish review.

msokolov commented 4 months ago

Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct?

ChrisHegarty commented 4 months ago

Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct?

Correct that it requires JDK 21+ and Panama Vector, but we can target 9.x also. And I do plan to backport to 9.x.

The difference between 9.x and 10 is that in 9.x the memory segment mmap provider is optional (enableMemorySegments) - so it can be disabled, whereas in 10 it is not. In 9.x we will need an extra check that the memory segment mmap provider is actually enabled, but it is otherwise already hooked off the Panama Vectors provider lookup so it should be fine.

uschindler commented 4 months ago

Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct?

Correct that it requires JDK 21+ and Panama Vector, but we can target 9.x also. And I do plan to backport to 9.x.

The difference between 9.x and 10 is that in 9.x the memory segment mmap provider is optional (enableMemorySegments) - so it can be disabled, whereas in 10 it is not. In 9.x we will need an extra check that the memory segment mmap provider is actually enabled, but it is otherwise already hooked off the Panama Vectors provider lookup so it should be fine.

Still not on vacation... You can easily backport this. No extra checks needed. If memory segments are disabled in 9.x, the IndexInput would not implement the new interface and therefore the code never triggers. If everything uses instance of checks all is fine.

Basically in earlier versions like jdk 11 or when explicitly disabled the whole thing behaves like if a different directory impl is used.

uschindler commented 4 months ago

Backporting would only require that you may need to duplicate versions for 19, 20, 21+. 19 has no vectorization, but 20 and 21 have identical vector code but differences in memory segments and that may hurt you.

ChrisHegarty commented 4 months ago

Backporting would only require that you may need to duplicate versions for 19, 20, 21+. 19 has no vectorization, but 20 and 21 have identical vector code but differences in memory segments and that may hurt you.

If it hurts too much (to get this on JDK 20) then we'll limit it to JDK 21+. :-)

uschindler commented 4 months ago

Back from vacation. Will look into that till Tuesday!

ChrisHegarty commented 4 months ago

We may add a method like getByteBufferSlice(....).

I experimented locally with similar before, and the performance impact when converting to/from MemorySegment was horrible. I don't disagree that for NIO/BB dir implementations this could be useful, but it would not intersect with the memory segment implementation, since it performs horribly (when converting from BB to MS, and vice versa).

uschindler commented 4 months ago

We may add a method like getByteBufferSlice(....).

I experimented locally with similar before, and the performance impact when converting to/from MemorySegment was horrible. I don't disagree that for NIO/BB dir implementations this could be useful, but it would not intersect with the memory segment implementation, since it performs horribly (when converting from BB to MS, and vice versa).

OK, this is not good. From my code review it looked like the DirectByteBuffer also only has a addres and therefor the whole thing is identical by performance (except that we have to get a MemorySegment slice first and then call toByteBuffer()). So this is interesting, but then I we can't do anything.

Mabe theres some optimizations miissing in ByteBuffer support of vector API.

So looks fine then. Working on a final review.

uschindler commented 4 months ago

Looks like first Java 22 build also worked fine, so no API incompatibilities in JDK (foreign preview vs final): https://jenkins.thetaphi.de/job/Lucene-main-Linux/48322/consoleText

gautamworah96 commented 3 months ago

Hey @ChrisHegarty do we have an open issue for extending this to float vectors? If not, I can create one

Edit: nevermind, I see you had pointed this out earlier

The scorer currently only operates on vectors with an element size of byte, since loading vector data from float[] (the fallback), is only supported in JDK 22. We can evaluate if and how to support floats separately. See https://bugs.openjdk.org/browse/JDK-8318678

Perhaps we can implement some logic to leverage fast dot product scoring for float[] if the runtime is >=22.

ChrisHegarty commented 3 months ago

Hey @ChrisHegarty do we have an open issue for extending this to float vectors? If not, I can create one

Edit: nevermind, I see you had pointed this out earlier

The scorer currently only operates on vectors with an element size of byte, since loading vector data from float[] (the fallback), is only supported in JDK 22. We can evaluate if and how to support floats separately. See https://bugs.openjdk.org/browse/JDK-8318678

Perhaps we can implement some logic to leverage fast dot product scoring for float[] if the runtime is >=22.

I filed the following issue to evaluate adding mode off-heap scoring. https://github.com/apache/lucene/issues/13515

uschindler commented 3 months ago

Hey @ChrisHegarty do we have an open issue for extending this to float vectors? If not, I can create one Edit: nevermind, I see you had pointed this out earlier

The scorer currently only operates on vectors with an element size of byte, since loading vector data from float[] (the fallback), is only supported in JDK 22. We can evaluate if and how to support floats separately. See https://bugs.openjdk.org/browse/JDK-8318678

Perhaps we can implement some logic to leverage fast dot product scoring for float[] if the runtime is >=22.

I filed the following issue to evaluate adding mode off-heap scoring.

Hi, if you map the MemorySegment slice into a ByteBuffer and add a Float view on top of it, it might work.

But I remember that you said using Buffers instead of MemorySegments suffered from bad performance. Because that wozuld have been my favoourite way for off-heap vector operations, because it would also work with old MMapDirectory and ByteBuffersDirectory.

Uwe