Closed giuliohome closed 1 year ago
fair enough, fyi F# version would be
let getAll () : (string * RedisEmbedding) seq =
let db = getDB ()
let info = db.Execute("KEYS","*")
let info_dict : string array = RedisResult.op_Explicit(info)
You should almost never issue KEYS *
, and even if you used SCAN
, this is a per-server operation not a global operation (this matters for "cluster"). There is an API on IServer
to do what you want, more efficiently. Finally note that the result of KEYS
is not a dictionary - it is a list/array of key names. It doesn't make sense to convert it to a dictionary, as a dictionary is a key/value pair mapping, and KEYS
does not return values.
What do I want to do? I need to rebuild the keys and the values of an index (it is coming from azure openai embeddings and metadata but that's irrelevant here). I've already done tbh, but I see it takes a few minutes, fwiw, I'm wondering why. It's running on a kubetnetes azure cluster.
Which api calls exactly would you suggest me to use? Btw the python version is really straightforward (because there are available python wrappers for redis to do what we want in a single api call, without the need to execute many low level redis commands), but I'm struggling to bring .net F# to my team. All those op explicit converters didn't help much, but eventually it all worked... If there are simpler apis for a hash index keys-values re-building (aside from keys
and many hgetall
and hset
etc.), I would be more than happy to use them.
Thanks for your kind and helpful reply
What exactly are you trying to do? "rebuild the keys and the values of an index" is incredibly vague. You mention hashes (hgetall
, hget
), but: keys
has nothing whatsoever to do with hashes. Are you trying to get all the keys and values of a single hash, by key? If so: there are method for that - maybe .HashGetAll(key).ToStringDictionary()
or similar?
This is what i see and i am doing, less vague than possible.
I have 500 embeddings vectors (with metadata) in my redis db. If a issue one "keys" cmd, it returns 500 strings. For each of those 500 keys I can run a hgetall and it will return the metadata and the embedding vector of that specific "key". That is triggered by a chatgpt api from azure from internal docs and from a user question but we don't care, this part can remain vague. The fact is there there will be a knn query for similarity search later on.
Now sometimes I need to rebuild the whole index (let's say we changed the source docs), hence maybe I will have to insert 700 different vectors and metadata for next search.
But my point is that the "keys" call returns the list of the keys for which I can read metadata through hgetall or I can write metadata (and embedding 1536 float32/byte vector) through hset.
I'm not sure if there is something else I should call instead of "keys" but that is working at least as I have tried to describe above
@giuliohome - if you are using redis as a vector database I'm assuming you are using RediSearch - you should just query the index with * as the query string- that will bring everything back for you.
https://redis.io/docs/stack/search/reference/vectors/
Assuming you're using this?
Yes sure RediSearch of course. Will try the query * Thanks but what about the building part? Should I still do many hset - e.g. 500 hset... this is the slow part - and finally a FT.Create (this seems fast instead) ? No other high level .net wrappers here? Thanks again
@giuliohome - if you are using redis as a vector database I'm assuming you are using RediSearch - you should just query the index with * as the query string- that will bring everything back for you.
ok, tested now
to get the total numner of items
127.0.0.1:6379> FT.SEARCH openai-langchain-redis-aks * LIMIT 0 0
1) (integer) 501
their ids
127.0.0.1:6379> FT.SEARCH openai-langchain-redis-aks * NOCONTENT LIMIT 0 501
yes, correct, I should be able to replace KEYS with FT.SEARCH
(regarding the "slow" part, later I've found is not related to redis and anyway I'm now using redis pipelines i.e IBatch)
I'm pretty sure if you run the FT.CREATE - first the documents will be indexed synchronously as they are inserted - that's how it works with everything else in Search - but I'm not 100% sure on vectors. NRedisStack has some support for vectors - and I will probably add vectors to Redis OM when I get some bandwidth (we've piloted it in Redis OM Spring)
they write
Before creating the index let's describe the dataset and insert entries.
and from my experience, if I remember well, my queries get the correct responses when the index is built after the vectors are inserted (I can be wrong here, considering the following).
What I find "slow" is the part where I am inserting (hset) the embeddings, not the FT.CREATE... Maybe it's because I need to use a pipeline: in my todo list next week, will look at the API for a pipeline here, but in the doc about pipelines .net tasks are mentioned as sort of equivalent, I'm confused..., I read instead in redis manual
Pipelining is not just a way to reduce the latency cost associated with the round trip time, it actually greatly improves the number of operations you can perform per second in a given Redis server
Of course I have used aync tasks for my hset operations in my .net code, hence if this is .net equivalent to the concept of redis pipelines, there would be nothing else I can do to optimize my code... it doesn't seem the case, from googling around
Redis offers a feature called pipeline that allows you to bulk send commands. This can drastically improved performance if you are running queries that can be batched together. The reason for this is that you only need to traverse the network to redis once, then Redis will run all the commands and return the result. In this article, we will learn how to use redis pipelining with Python.
.Net async tasks - per se - would not reduce the number of network roundtrips but they only make them non-blocking for the client execution flow... Indeed redis py does something very different: it "pack_commands" on the connection using redis protocol.
I guess I have to study StackExchange.Redis.IBatch
! db.CreateBatch()
and batch.HashSetAsync(
... => prepared PoC here running ok in gitlab ci pipeline. After further analysis in my environment I've found that all my f# code for redis runs smoothly and fast (the slow part was external to redis).
Well, a wrapper would help preventing this sort of problems/doubts. Actually this is what python wrapper is doing under the hood: see https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/redis.py#L213 (and yeah they create the index before adding the docs)
I'm trying a command
info = db.Execute("KEYS","*")
but gettingRedisResult.ToDictionary
throws theIndexOutOfRangeException
It should be an array of strings.
However, the explicit operator is working fine