alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
https://alexklibisz.github.io/elastiknn
Apache License 2.0
368 stars 48 forks source link

Switch from sun.misc.Unsafe to java.nio.ByteBuffer for vectoer (de-)serialization (forwards and backwards compatible) #608

Closed alexklibisz closed 9 months ago

alexklibisz commented 9 months ago

Related Issue

263

Changes

Migrating vector serialization functions from sun.misc.Unsafe to java.nio.ByteBuffer.

The new serialization and deserialization are both compatible with sun.misc.Unsafe, so users do not need to worry about re-indexing when they make an upgrade.

The motivations for this switch are:

  1. sun.misc.Unsafe is deprecated and will be removed from the JVM at some point.
  2. sun.misc.Unsafe requires extra security policies to install Elastiknn as a plugin. With ByteBuffer, we're able to remove these policies.
  3. The ByteBuffer API is slightly nicer.

Micro benchmarks show the new functions are roughly as fast or faster across the board:

info] Benchmark                                                      Mode  Cnt          Score   Error  Units
[info] VectorSerializationBenchmarks.readFloats_ByteBuffer           thrpt         2121757.541          ops/s
[info] VectorSerializationBenchmarks.readFloats_Unsafe               thrpt         2194790.835          ops/s
[info] VectorSerializationBenchmarks.readInt_ByteBuffer              thrpt       111727965.417          ops/s
[info] VectorSerializationBenchmarks.readInt_Unsafe                  thrpt       165961369.448          ops/s
[info] VectorSerializationBenchmarks.readInts_ByteBuffer             thrpt         2207100.329          ops/s
[info] VectorSerializationBenchmarks.readInts_Unsafe                 thrpt         2136738.508          ops/s
[info] VectorSerializationBenchmarks.writeFloats_ByteBuffer          thrpt         2161012.290          ops/s
[info] VectorSerializationBenchmarks.writeFloats_Unsafe              thrpt         2189182.950          ops/s
[info] VectorSerializationBenchmarks.writeInt_ByteBuffer             thrpt       107505261.268          ops/s
[info] VectorSerializationBenchmarks.writeInt_Unsafe                 thrpt        27993290.467          ops/s
[info] VectorSerializationBenchmarks.writeIntsWithPrefix_ByteBuffer  thrpt         2170718.842          ops/s
[info] VectorSerializationBenchmarks.writeIntsWithPrefix_Unsafe      thrpt         2171472.999          ops/s
[info] VectorSerializationBenchmarks.writeInts_ByteBuffer            thrpt         2191712.505          ops/s
[info] VectorSerializationBenchmarks.writeInts_Unsafe                thrpt         2202876.831          ops/s

The only one that's slower is readInt, but that is only used in tests anyways.

The ann-benchmarks also look good:

image

Testing and Validation

Standard CI and JMH benchmarking. New unit tests also verify forwards/backwards compatibility with the old serialization format.