How to scan via wildcard efficiently?

StackExchange / StackExchange.Redis

General purpose redis client

https://stackexchange.github.io/StackExchange.Redis/

Other

5.89k stars 1.51k forks source link

How to scan via wildcard efficiently? #2464

Closed Ekotenggara closed 1 year ago

Ekotenggara commented 1 year ago

I have a few million keys in redis, following this pattern : "R:number-number-number-number", for example: "R-1-2-1-2", "R-2-3-1-2" how do I know that the key matches with "R:-123-" exists in my collection?

In redis cli, I can use : SCAN 0 MATCH "R:-123-" This returns very fast, both whether the wildcard is found or not found.

This is my attempt to do it via StackExchange.Redis in C#.

using StackExchange.Redis;

string connectionString = "my_redis_connection_string";
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(connectionString);
IDatabase database = redis.GetDatabase();

string pattern = "R:*-123-*"; 
var matchingKeys = redis.GetServer(redis.GetEndPoints()[0])
    .Keys(database.Database, pattern);

bool keyExists = matchingKeys.Any();

This works, it is fast to return 'true' when the wildcard is found, but it is very slow to return 'false' when the wildcard is not found.

What did I do wrong? How do I need to do to speed it up? Thanks

mgravell commented 1 year ago

Can I clarify: in your fast version, are you just issuing:

SCAN 0 MATCH "R:-123-"

or are you doing a scan loop using the return value, looping until you see the token zero? If you don't have that loop: your fast version is quickly giving you not an answer. The entire point of SCAN is that to avoid extended blocking, it only looks at a tiny portion of the keyspace per call. If it returns a match on the first page, then sure you can report "hit" quickly, but: you have no idea whether you have a "miss" until you have iterated the entire keyspace, hence the need for the loop.

I strongly suspect this misunderstanding is behind the perceived difference here, and that if you fix the implementation by adding a loop, I suspect you'll find it behaves similarly.

And to be explicit: there does not currently exist, in redis, an efficient way to ask the question you have here.

You might be able to improve performance some by increasing the page size via the optional parameter.

mgravell commented 1 year ago

To put that in human terms, each call to SCAN (with a successive resume token) is like reading a single page of a book. If you want to know if a book contains the word "asparagus", it is very fast to look at just the first page. If the first page contains the word "asparagus": you have your answer - it does! But if you don't find the word on the first page, and don't look at any other pages, then: you still have no clue whether the book contains the word.

Answering the question properly - especially in the "no" case - is slower, as it involves reading the entire book.

Ekotenggara commented 1 year ago

Thank you @mgravell , I completely understand now, thank you for your explanation. After rethinking the situation, I ended up using nredisearch via aggregationbuilder to do my search and I can achieve what I need that way. Thank you again.