ashish-goyal / redis

Automatically exported from code.google.com/p/redis
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[feature request] #613

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Would be interesting if you can add "federation" in redis - e.g. - redis server 
1 have let say 2 GB data, redis server 2 have let say 3 GB.

then, you query only server 1, and if key is not found there, server 1 
automaticly search for the key on server 2.

server 1 "swaps" on server 2.

if this is implemented on the server, there will be some advantages such - 
intersects on the server, rather than on client with retrieving all sets, 
"unlimited" memory if you "connect" several servers etc.

Explanation where it can be used:

I needed this because I tried to do full text index on table with 70M records. 
My estimation was that the index will take about 30 GB. Since I have no test 
environment with 30 GB, I could use 5-6 machines with 10-16 GB each. From the 
other hand, I can not shard / hash, because I can not do fast intersect / 
unions between the servers.

Original issue reported on code.google.com by n...@photonhost.com on 27 Jul 2011 at 6:49

GoogleCodeExporter commented 8 years ago
Are you absolutely sure you want to use Redis for that? Did you consider 
full-text search engines (Sphinx, Solr, etc)

Original comment by Sergei.T...@gmail.com on 27 Jul 2011 at 9:16

GoogleCodeExporter commented 8 years ago
according some benchmarks I did on 10M records ~6 GB redis memory, Redis is 
quite fast - less than second.

even PHP with redis strings "12,13,14" and "external" intersects ( 
array_intersect() ) do it for ~10 seconds as well. This is not fast, but is 
quite impressing.

we currently use Sphinx it performs for under the second too, but load average 
of the server go up/down every second.

when we start the indexer things getting even worse :)

Original comment by n...@photonhost.com on 27 Jul 2011 at 2:11

GoogleCodeExporter commented 8 years ago
I did not said anything about Solr - my college said he tested it and results 
were very bad.

Original comment by n...@photonhost.com on 27 Jul 2011 at 2:16

GoogleCodeExporter commented 8 years ago
Even if Redis "swapped" to other machines, it still would not do fast 
intersect/union operations.

Assuming that you are using sets or zsets to store your term -> documents 
lists, instead of performing...

conn.sadd('index:<term>', <docid>)

You could do the following:

conn.sadd('index:<term>:{<shardid>}', <docid>)

Then make sure your Redis client is sharding aware (many of them are nowadays). 
For performing searches, you need to make that sharding aware too (perform the 
intersections/unions across multiple machines, pull results from all of them, 
etc.). It will be quite a bit of work in the short-term, but as your search 
index gets larger than your current production box, it may pay off very well.

Also, there is always bare Lucene (which looks to be faster than Sphinx: 
http://ai-cafe.blogspot.com/2009/08/lucene-vs-sphinx-showdown-on-large.html), 
and Xapian.

Original comment by josiah.c...@gmail.com on 27 Jul 2011 at 6:52

GoogleCodeExporter commented 8 years ago
Hi
I did not checked the redis source code, but what i mean is following:

if (!key_exists(a))
   return get_remote_key(a);
...

for each redis you can configure, single additional redis. similar to 
replication, but kind of backwards :)

Original comment by nikolay....@gmail.com on 28 Jul 2011 at 6:49

GoogleCodeExporter commented 8 years ago
It looks simple in pseudo-code, but it really isn't. This is typically 
something that should be solved on the application side, by some kind of 
coordinating process, since you know what the exact configuration of your 
machine is, and where data should be stored and is when you need it. You can 
check out the unstable branch of Redis, which features DUMP/RESTORE to get a 
raw representation of keys and migrate them to other boxes.

As Josiah mentions: moving a set over the network in order to intersect it will 
never be a fast operation. The best approach I can think of here is some kind 
of scatter/gather algorithm that takes the partial intersection of sets that 
live on 1 shard (map, if you will), migrate them to a single machine and 
perform the final intersection (reduce, if you will).

Closing because the original idea will not be added.

Original comment by pcnoordh...@gmail.com on 28 Jul 2011 at 8:08