Open romange opened 1 year ago
I think you meant writing in the title RedisSearch and not JsonSearch.
You are right, but since the subset of functionality I want to focus on is within JSON , this mistake makes sense 😄
Could be a great MVP for query part
I'm not a RediSearch user (yet) but have been very interested in it recently as it seems to be exactly what I need for my use case.
In particular, the vector similarity search can be a killer feature to have in dragonfly.
In general, RediSearch seems to be an all around great feature and having it in dragonfly, in my opinion, would bring lots of adoption. Just my 2 cents
My project heavily utilizes the combination of RediSearch and RediJSON, requiring roughly 100-300 FT.SEARCH commands per second in 2-3 million records to obtain results that meet various conditions. Additionally, the TTL of my 200-300 million records is only around 180-480 seconds, meaning the load on both writing and reading (FT.SEARCH, with the requirement that the average query result returns within 300ms) from the Redis cluster is quite high. As a result, I had to build a Redis cluster to meet these demands, which makes the overall maintenance cost relatively high. Therefore, I'm looking for an architecture that can achieve this effect at a lower cost. If DragonFlyDB can provide full-text search capabilities similar to RediSearch and RediJSON, I would be willing to give it a try.
Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search?
Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search?
I currently do not need to use stemming because my project is mainly to help users match mobile phone numbers related to their favorite numbers. Since the matching is all about numeric strings, even if it is a Chinese project, I don't need to use Chinese, just numbers and English letters. Here is my search example:
FT.SEARCH tm '@hitMassRuleId:lastABABAB|anyABCDABCD|lastAAABBB|lastAABBCC|lastABCABC|lastABCDDBCAXXX|anyAABBCC|anyAAABBB|lastAAAAB|lastAAAAA|anyABCDEF|anyAAAAA|lastAABBB|lastABCDABDCXXX|lastABCDBACD|lastABCDBACDXXX|lastABCDDCBA|lastAAAA|lastABCDACBDXXX|lastABBA|lastABBCBB|lastABCDABDC|anyAAAA|lastAABB|anyABABAB|anyABCABC|anyAAAAB|midAAAA|anyAAABB|anyABBCBB|lastABABtu368|anyAABBB|lastABBB|lastABABtu613|lastABAB|lastABABtu850|midBAAA|lastABCD|midAAAB|lastAABCC|midAABB|lastBrithYear758799|lastAAAB|midABCD|midABAB|any888|any666|lastAABAAXXX|lastABCCBAXX|anyABAB|anyABABtu368|anyBrithYear758799|anyABBA|anyAABB|anyABABtu850|anyABABtu613|lastABB|anyABCD|lastXAXAXAXA|lastXAXAXA|lastABAC|lastAXAXAX|anyAAA|head1889|lastABACAD|anyABBCDD|lastAXAXAXAX @ttlInSecond:[1683364311 +inf] @providerCode:jyxf @status:1 @preOrderTime:[-inf (1683364009] @touchCode:P0000008 @province:beijing @city:beijing @last4:7777' LIMIT 0 1
Looks like a structured search, do not see here any full text-search requirements but maybe i am missing something.
On Tue, May 9, 2023, 18:34 0.618 @.***> wrote:
Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search?
I currently do not need to use stemming because my project is mainly to help users match mobile phone numbers related to their favorite numbers. Since the matching is all about numeric strings, even if it is a Chinese project, I don't need to use Chinese, just numbers and English letters. Here is my search example: FT.SEARCH tm @.***:lastABABAB|anyABCDABCD|lastAAABBB|lastAABBCC|lastABCABC|lastABCDDBCAXXX|anyAABBCC|anyAAABBB|lastAAAAB|lastAAAAA|anyABCDEF|anyAAAAA|lastAABBB|lastABCDABDCXXX|lastABCDBACD|lastABCDBACDXXX|lastABCDDCBA|lastAAAA|lastABCDACBDXXX|lastABBA|lastABBCBB|lastABCDABDC|anyAAAA|lastAABB|anyABABAB|anyABCABC|anyAAAAB|midAAAA|anyAAABB|anyABBCBB|lastABABtu368|anyAABBB|lastABBB|lastABABtu613|lastABAB|lastABABtu850|midBAAA|lastABCD|midAAAB|lastAABCC|midAABB|lastBrithYear758799|lastAAAB|midABCD|midABAB|any888|any666|lastAABAAXXX|lastABCCBAXX|anyABAB|anyABABtu368|anyBrithYear758799|anyABBA|anyAABB|anyABABtu850|anyABABtu613|lastABB|anyABCD|lastXAXAXAXA|lastXAXAXA|lastABAC|lastAXAXAX|anyAAA|head1889|lastABACAD|anyABBCDD|lastAXAXAXAX @ttlInSecond:[1683364311 +inf] @providerCode:jyxf @status:1 @preOrderTime:[-inf (1683364009] @touchCode:P00000035328 @province:beijing @city:beijing ' LIMIT 0 0
— Reply to this email directly, view it on GitHub https://github.com/dragonflydb/dragonfly/issues/431#issuecomment-1540407120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCHVM5YEHKYLCMXDPWTXFJPXRANCNFSM6AAAAAARNGMONU . You are receiving this because you authored the thread.Message ID: @.***>
Looks like a structured search, do not see here any full text-search requirements but maybe i am missing something. … On Tue, May 9, 2023, 18:34 0.618 @.> wrote: Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search? I currently do not need to use stemming because my project is mainly to help users match mobile phone numbers related to their favorite numbers. Since the matching is all about numeric strings, even if it is a Chinese project, I don't need to use Chinese, just numbers and English letters. Here is my search example: FT.SEARCH tm @.:lastABABAB|anyABCDABCD|lastAAABBB|lastAABBCC|lastABCABC|lastABCDDBCAXXX|anyAABBCC|anyAAABBB|lastAAAAB|lastAAAAA|anyABCDEF|anyAAAAA|lastAABBB|lastABCDABDCXXX|lastABCDBACD|lastABCDBACDXXX|lastABCDDCBA|lastAAAA|lastABCDACBDXXX|lastABBA|lastABBCBB|lastABCDABDC|anyAAAA|lastAABB|anyABABAB|anyABCABC|anyAAAAB|midAAAA|anyAAABB|anyABBCBB|lastABABtu368|anyAABBB|lastABBB|lastABABtu613|lastABAB|lastABABtu850|midBAAA|lastABCD|midAAAB|lastAABCC|midAABB|lastBrithYear758799|lastAAAB|midABCD|midABAB|any888|any666|lastAABAAXXX|lastABCCBAXX|anyABAB|anyABABtu368|anyBrithYear758799|anyABBA|anyAABB|anyABABtu850|anyABABtu613|lastABB|anyABCD|lastXAXAXAXA|lastXAXAXA|lastABAC|lastAXAXAX|anyAAA|head1889|lastABACAD|anyABBCDD|lastAXAXAXAX @ttlInSecond:[1683364311 +inf] @providerCode:jyxf @status:1 @preOrderTime:[-inf (1683364009] @TouchCode:P00000035328 @province:beijing @city:beijing ' LIMIT 0 0 — Reply to this email directly, view it on GitHub <#431 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCHVM5YEHKYLCMXDPWTXFJPXRANCNFSM6AAAAAARNGMONU . You are receiving this because you authored the thread.Message ID: @.***>
I use full-text search by decomposing all possible combinations of a phone number into individual keys in a JSON object and then setting these JSON keys as index values in RediSearch. For example, in order to match phone numbers similar to a user's license plate number, I specifically decompose various possible combinations of 5 consecutive digits of the target phone number, as follows:
{
"phone": "13085669245",
"status": "1",
"owner": "",
"ttlInSecond": 0,
"preOrderTime": 0,
"providerCode": "beijing",
"rule_car": {
"any5": "69245",
"any4": "X9245,6X245,69X45,692X5,6924X",
"any3": "XX245,X9X45,X92X5,X924X,6XX45,6X2X5,6X24X,69XX5,69X4X,692XX",
"tail5": "69245",
"tail4": "9245",
"tail3": "245",
"continuous5": "66924,56692,85669,08566,30856,13085",
"continuous4": "6924,6692,5669,8566,0856,3085,1308",
"continuous3": "924,692,669,566,856,085,308,130"
},
}
In my case, fulltext search is the least interesting feature of RediSearch. I'm more interested in running queries like:
FT.SEARCH items-index "(@brand:xxx @model:xxx)=>[KNN 10 @vector $vector as score]" ...
which translates to something like: for all items that match brand xxx
and model xxx
, get me the top 10 closest items to the given $vector. Stemming/normalization could be useful for the attributes filter I guess but the power is more about searching multiple attributes and returning other attriubutes/columns (and of course the vector similarity search is great).
In Redis Cluster mode, I am unable to simultaneously call FT.SEARCH and JSON.SET operations within a single Lua script, because doing so involves different hosts and different slots, and cross-slot combination operations are not supported. This is one of the areas where I think Redis Cluster mode is not as perfect as it could be.
@sirfz hey can you DM me on discord? I am curious to hear more about your usecase.
@sirfz @dwzkit we will have an experimental version of FT.SEARCH
in v1.10 (next release).
Would you like to try it out?
@sirfz @dwzkit we will have an experimental version of
FT.SEARCH
in v1.10 (next release). Would you like to try it out?
I'm sorry, I've been busy with other projects recently and may not have time to experiment for the next two months.
Quoting a nodejs user, they want to create an index like this:
and then be able to query it like this:
note - we do not need a full-text search, stemming, query rewrite and other language related features. Instead this task is about formal, semi-structured querying that will provide lots of value for folks that use RedisJson.
The task is a super task that should be broken down into smaller sub-projects: 1) Auto indexing (FT.CREATE) 2) Building a query AST tree with all the operators we support. 3) Executing a query without query plan optimizations.