Closed Pulkitg64 closed 3 weeks ago
TestInt8HnswBackwardsCompatibility.java
and TestBasicBackwardsCompatibility
test cases failing for Lucene versions older than LUCENE_10_0_0
. For older Lucene version, we try to read Index directory
from resources folder. For example for Lucene-9.10.0, the directory path is located at lucene/backward-codecs/src/test/org/apache/lucene/backward_index/index.9.10.0-cfs.zip
.
As per my understanding the segment present at above directory already has KNNVectorField
with MAXIMUM_INNER_PRODUCT as vector similarity function and in above test we try to change similarity function to DOT_PRODUCT
due to which there is conflict happening when adding more docs to it and hence these test cases failing.
Oh okay, that means we need to remove support for COSINE just from indexing side not from searching side?
Correct. Practically, this means keeping the enum constant but marking it as deprecated, and modifying IndexingChain to fail using COSINE on 10.x Lucene indexes.
It would be nice to also add a new file format under lucene/codecs that does cosine similarity, like we just added for hamming distance. This could help with the migration for some users.
Thanks @jpountz for confirming and suggestions.
Thanks @benwtrent for all the feedback. I have raised another revision just to mark the COSINE function as deprecated. I didn't find any internal usages of this function. In a followup PR, I will remove COSINE from writes for LUCENE version >=10.0.0
I didn't find any internal usages of this function.
That doesn't make sense to me. IntelliJ tells me there are over 30 usages of VectorSimilarityFunction.COSINE.
I didn't find any internal usages of this function.
That doesn't make sense to me. IntelliJ tells me there are over 30 usages of VectorSimilarityFunction.COSINE.
Sorry I think I am missing something, I think we cannot remove COSINE from those places, since we require them in LUCENE version < 10.0. They are mostly being used in FieldInfosFormat, VectorReader, ScalarQuantizedVectorWriter classes.
I think the main task is to first get rid of dependency on enum for VectorSimilarityFunction and instead have some interface to get those functions and support.
@Pulkitg64 could you deprecate also VectorUtil.cosine
? That is technically a public API.
Hi @benwtrent, could we merge this change, if everything looks good to you?
@Pulkitg64 sorry for radio silence, I was out of office and swamped with other things. If you could move the change log to 9.12, I can merge and backport.
I somehow messed up with my branch while pulling changes from main
branch. @benwtrent I have created a new PR for marking the cosine VectorSImilarity function as deprecated.
https://github.com/apache/lucene/pull/13473
Sorry for the mess!
Description
Tries to solve #13281 by deprecating
COSINE
VectorSimilarity function for Vector Search.