RedisVentures / redis-vector-php

Redis Vector Library (RedisVL) enables Redis as a real-time database for LLM applications, based on Predis PHP client
MIT License
10 stars 0 forks source link

all semantic search items will have a score of 0 #17

Open johnqiuwan opened 1 month ago

johnqiuwan commented 1 month ago

I have followed the doc to do the Realtime search query. The setup is smooth, and the query has no error.

However, I noticed that all the result of the semantic search query items will have a score of 0

Is that normal ?

vladvildanov commented 1 month ago

@johnqiuwan Thanks for reaching out! Could you provide more details on the problem and some reproducable code example?

johnqiuwan commented 1 month ago

Thank you for the quick reply!

Sample code to process the semantic search

<?php

namespace App\Services;

use RedisVentures\RedisVl\Vectorizer\Factory;
use RedisVentures\RedisVl\VectorHelper;
use RedisVentures\RedisVl\Query\VectorQuery;
use RedisVentures\RedisVl\Index\SearchIndex;
use Predis\Client;

class VectorQueryService
{
    protected $factory;
    protected $vectorProvider;
    protected $vectorHelper;
    protected $index;
    public function __construct()
    {
        //
        $this->factory = new Factory();
        $this->vectorProvider =
            $this->factory->createVectorizer('openai', env('TEXT_EMBEDDING_MODEL'));
        $this->vectorHelper = new VectorHelper();

        $this->index = new SearchIndex(new Client(), $this->schema());

        $this->index->create();
    }

    private function schema()
    {
        $schema = [
            'index' => [
                'name' => 'idx:product',
                'prefix' => 'laravel_hemes_database_product_by_id:',
                'storage_type' => 'json',
            ],
            'fields' => [
                'id' => [
                    // 'path' => '$.id',
                    'type' => 'numeric',
                ],
                'description' => [
                    // 'path' => '$.description',
                    'type' => 'text',
                ],
                'vector' => [
                    // 'path' => '$.description_embeddings',
                    'type' => 'vector',
                    'dims' => 1536,
                    'datatype' => 'float32',
                    'algorithm' => 'flat',
                    'distance_metric' => 'cosine'
                ],
                'image' => [
                    'type' => 'tag'
                ],
                'slug' => [
                    // 'path' => '$.slug',
                    'type' => 'tag',
                ],
                'product_name_text' => [
                    // 'path' => '$.product_name',
                    'type' => 'text',
                ],
                'price' => [
                    // 'path' => '$.price',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'current_price' => [
                    // 'path' => '$.price',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'created' => [
                    // 'path' => '$.created_at',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'variant_options' => [
                    // 'path' => '$.variant_options',
                    'type' => 'tag',
                ],
                'model' => [
                    // 'path' => '$.product_specifications.model',
                    'type' => 'text',
                ],
                'category' => [
                    //'path' => '$.product_specifications.category',
                    'type' => 'tag',
                ],
                'manufactory' => [
                    // 'path' => '$.product_specifications.manufactory[*]',
                    'type' => 'tag',
                ],
            ],
        ];
        return $schema;
    }

    public function embed($text)
    {
        $embedding = $this->vectorProvider->embed($text);
        $embedding = $embedding['data'][0]['embedding'];

        if (!is_array($embedding)) {
            $embedding = [$embedding];
        }
        return $embedding;
    }

    public function query($embedding)
    {
        // $embedding = [VectorHelper::toBytes($embedding)];

        $query = new VectorQuery($embedding, 'vector', ['id', 'description', 'product_name_text', 'variant_options', 'model', 'category', 'manufactory', 'price', 'current_price',  'slug', 'image'], 10, true, 3);

        return $this->index->query($query);
    }

    public function processResult($result)
    {
        return collect($result)->map(function ($product, $key) {
            return collect($product)->transform(function ($value) {
                return json_decode($value, true);
            });
        })->values()->toArray();
    }

    public function resultDto($result)
    {

        return collect($result)->map(function ($product, $key) {
            $product['id'] = $product['id'][0];
            $product['description'] = $product['description'][0];
            $product['product_name'] = $product['product_name_text'][0];
            $product['slug'] = $product['slug'][0];
            return $product;
        })->toArray();
    }
}

Context:

  1. using redisjson to store the embedding data
  2. using openai text-embedding-3-small model to do the embedding (dimension 1536)

Already checked:

  1. The redisjson index created successfully
  2. The embedding data stored successfully in redisjson
  3. There is no errors when perform the search

Problem: All the items returned will have a score of 0

Expected behavior the score should not all 0

Versions:

Additional context If the vector value is updated in the redisjson, the search result will update accordingly. It seems the search is working but just all the scores are 0.

johnqiuwan commented 3 weeks ago

Does any updates on this @vladvildanov , thank you

vladvildanov commented 3 weeks ago

@johnqiuwan By default Redis calculates scores based on terms frequency and it's occurrences in the document. Could you try to use other scorers available by default in Redis? It feels like it's something related to server-side

https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/scoring/

johnqiuwan commented 3 weeks ago

Thank you for the updates! I have looked the doc from the link you gave, but I am still not make sure why all return items has score of 0. It seems to me is a bug, but the items will update if the embedding updated. This is kind of strange to me , lol

I am appreciated your time and your amazing work.

vladvildanov commented 3 weeks ago

Thank you so much! Let me know if you find something or feel free to contribute 👌