StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.89k stars 1.51k forks source link

IndexOutOfRangeException for RedisResult.ToDictionary #2472

Closed giuliohome closed 1 year ago

giuliohome commented 1 year ago

I'm trying a command info = db.Execute("KEYS","*") but gettingRedisResult.ToDictionary throws the IndexOutOfRangeException

It should be an array of strings.

using StackExchange.Redis;

var r = ConnectionMultiplexer.Connect("localhost:6379");

var db = r.GetDatabase();

var info = db.Execute("KEYS", new string[] { "*"});

var dict = info.ToDictionary();
System.IndexOutOfRangeException
  HResult=0x80131508
  Messaggio=Index was outside the bounds of the array.
  Origine=StackExchange.Redis
  Analisi dello stack:
   in StackExchange.Redis.RedisResult.ToDictionary(IEqualityComparer`1 comparer)

However, the explicit operator is working fine

var dict = ((string?[]?)info);

foreach (var item in dict ?? Array.Empty<string>())
{
    Console.WriteLine("key  " + item);
}
giuliohome commented 1 year ago

fair enough, fyi F# version would be

let getAll () : (string * RedisEmbedding) seq = 
   let db = getDB ()
   let info = db.Execute("KEYS","*") 
   let info_dict : string array = RedisResult.op_Explicit(info) 
mgravell commented 1 year ago

You should almost never issue KEYS *, and even if you used SCAN, this is a per-server operation not a global operation (this matters for "cluster"). There is an API on IServer to do what you want, more efficiently. Finally note that the result of KEYS is not a dictionary - it is a list/array of key names. It doesn't make sense to convert it to a dictionary, as a dictionary is a key/value pair mapping, and KEYS does not return values.

giuliohome commented 1 year ago

What do I want to do? I need to rebuild the keys and the values of an index (it is coming from azure openai embeddings and metadata but that's irrelevant here). I've already done tbh, but I see it takes a few minutes, fwiw, I'm wondering why. It's running on a kubetnetes azure cluster. Which api calls exactly would you suggest me to use? Btw the python version is really straightforward (because there are available python wrappers for redis to do what we want in a single api call, without the need to execute many low level redis commands), but I'm struggling to bring .net F# to my team. All those op explicit converters didn't help much, but eventually it all worked... If there are simpler apis for a hash index keys-values re-building (aside from keys and many hgetall and hset etc.), I would be more than happy to use them.

Thanks for your kind and helpful reply

mgravell commented 1 year ago

What exactly are you trying to do? "rebuild the keys and the values of an index" is incredibly vague. You mention hashes (hgetall, hget), but: keys has nothing whatsoever to do with hashes. Are you trying to get all the keys and values of a single hash, by key? If so: there are method for that - maybe .HashGetAll(key).ToStringDictionary() or similar?

giuliohome commented 1 year ago

This is what i see and i am doing, less vague than possible.

I have 500 embeddings vectors (with metadata) in my redis db. If a issue one "keys" cmd, it returns 500 strings. For each of those 500 keys I can run a hgetall and it will return the metadata and the embedding vector of that specific "key". That is triggered by a chatgpt api from azure from internal docs and from a user question but we don't care, this part can remain vague. The fact is there there will be a knn query for similarity search later on.

Now sometimes I need to rebuild the whole index (let's say we changed the source docs), hence maybe I will have to insert 700 different vectors and metadata for next search.

But my point is that the "keys" call returns the list of the keys for which I can read metadata through hgetall or I can write metadata (and embedding 1536 float32/byte vector) through hset.

I'm not sure if there is something else I should call instead of "keys" but that is working at least as I have tried to describe above

slorello89 commented 1 year ago

@giuliohome - if you are using redis as a vector database I'm assuming you are using RediSearch - you should just query the index with * as the query string- that will bring everything back for you.

slorello89 commented 1 year ago

https://redis.io/docs/stack/search/reference/vectors/

Assuming you're using this?

giuliohome commented 1 year ago

Yes sure RediSearch of course. Will try the query * Thanks but what about the building part? Should I still do many hset - e.g. 500 hset... this is the slow part - and finally a FT.Create (this seems fast instead) ? No other high level .net wrappers here? Thanks again

edit

@giuliohome - if you are using redis as a vector database I'm assuming you are using RediSearch - you should just query the index with * as the query string- that will bring everything back for you.

ok, tested now

to get the total numner of items

127.0.0.1:6379> FT.SEARCH openai-langchain-redis-aks * LIMIT 0 0
1) (integer) 501

their ids

127.0.0.1:6379> FT.SEARCH openai-langchain-redis-aks * NOCONTENT LIMIT 0 501

yes, correct, I should be able to replace KEYS with FT.SEARCH

(regarding the "slow" part, later I've found is not related to redis and anyway I'm now using redis pipelines i.e IBatch)

slorello89 commented 1 year ago

I'm pretty sure if you run the FT.CREATE - first the documents will be indexed synchronously as they are inserted - that's how it works with everything else in Search - but I'm not 100% sure on vectors. NRedisStack has some support for vectors - and I will probably add vectors to Redis OM when I get some bandwidth (we've piloted it in Redis OM Spring)

giuliohome commented 1 year ago

Here in their docs

they write

Before creating the index let's describe the dataset and insert entries.

and from my experience, if I remember well, my queries get the correct responses when the index is built after the vectors are inserted (I can be wrong here, considering the following).

What I find "slow" is the part where I am inserting (hset) the embeddings, not the FT.CREATE... Maybe it's because I need to use a pipeline: in my todo list next week, will look at the API for a pipeline here, but in the doc about pipelines .net tasks are mentioned as sort of equivalent, I'm confused..., I read instead in redis manual

Pipelining is not just a way to reduce the latency cost associated with the round trip time, it actually greatly improves the number of operations you can perform per second in a given Redis server

Of course I have used aync tasks for my hset operations in my .net code, hence if this is .net equivalent to the concept of redis pipelines, there would be nothing else I can do to optimize my code... it doesn't seem the case, from googling around

Redis offers a feature called pipeline that allows you to bulk send commands. This can drastically improved performance if you are running queries that can be batched together. The reason for this is that you only need to traverse the network to redis once, then Redis will run all the commands and return the result. In this article, we will learn how to use redis pipelining with Python.

.Net async tasks - per se - would not reduce the number of network roundtrips but they only make them non-blocking for the client execution flow... Indeed redis py does something very different: it "pack_commands" on the connection using redis protocol. I guess I have to study StackExchange.Redis.IBatch! db.CreateBatch() and batch.HashSetAsync( ... => prepared PoC here running ok in gitlab ci pipeline. After further analysis in my environment I've found that all my f# code for redis runs smoothly and fast (the slow part was external to redis).

Well, a wrapper would help preventing this sort of problems/doubts. Actually this is what python wrapper is doing under the hood: see https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/redis.py#L213 (and yeah they create the index before adding the docs)