esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

ISSUE-325: Vectors for each king, one float to rule them all #326

Closed DiegoPino closed 2 months ago

DiegoPino commented 3 months ago

See #325

This stuff works now.

Requires a fixed entry in schema_extra_fields.xml

like

<!-- ML/vectors -->
<dynamicField name="knn576m_*" type="knn_vector_576" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn576s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn512m_*" type="knn_vector_512" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn512s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn1024m_*" type="knn_vector_1024" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn1024s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn3846m_*" type="knn_vector_384" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn384s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn576m_X3b_und_*" type="knn_vector_576" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn576s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn512m_X3b_und_*" type="knn_vector_512" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn512s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn1024m_X3b_und_*" type="knn_vector_1024" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn1024s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn3846m_X3b_und_*" type="knn_vector_384" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn384s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />

plus the provided types in schema_extra_types.xml

<!--
  Dense Vector Field of 384 dimensions suitable for Bert text feature extraction (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_384" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 512 dimensions suitable for Apple Vision ML Image FingerPrint (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_512" class="solr.DenseVectorField" vectorDimension="512" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 576 dimensions suitable for YOLOv8 feature extraction using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_576" class="solr.DenseVectorField" vectorDimension="576" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 1024 dimensions suitable for mobileNetV3 feature extraction (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_1024" class="solr.DenseVectorField" vectorDimension="1024" similarityFunction="dot_product"/>

Supporting code provided by https://github.com/esmero/strawberry_runners/pull/92

DiegoPino commented 2 months ago

@alliomeria this requires a few extra lines of code still. Works well but if I want to search for vectors and also want Filters I want/need to be sure I can also decide if the Filter applies BEFORE the vector (means I reduce the set, then I make the expensive KNN search) OR after (e.g if I want only returns if score 0.7>)

DiegoPino commented 2 months ago

This explains this in a bit more depth https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html This is specially important when using facets. Do we want facets to be applied before the KNN (so we always get e.g 10 results) or After (so we do get less than 10 even if topK is 10)