Closed GoogleCodeExporter closed 9 years ago
Are you absolutely sure you want to use Redis for that? Did you consider
full-text search engines (Sphinx, Solr, etc)
Original comment by Sergei.T...@gmail.com
on 27 Jul 2011 at 9:16
according some benchmarks I did on 10M records ~6 GB redis memory, Redis is
quite fast - less than second.
even PHP with redis strings "12,13,14" and "external" intersects (
array_intersect() ) do it for ~10 seconds as well. This is not fast, but is
quite impressing.
we currently use Sphinx it performs for under the second too, but load average
of the server go up/down every second.
when we start the indexer things getting even worse :)
Original comment by n...@photonhost.com
on 27 Jul 2011 at 2:11
I did not said anything about Solr - my college said he tested it and results
were very bad.
Original comment by n...@photonhost.com
on 27 Jul 2011 at 2:16
Even if Redis "swapped" to other machines, it still would not do fast
intersect/union operations.
Assuming that you are using sets or zsets to store your term -> documents
lists, instead of performing...
conn.sadd('index:<term>', <docid>)
You could do the following:
conn.sadd('index:<term>:{<shardid>}', <docid>)
Then make sure your Redis client is sharding aware (many of them are nowadays).
For performing searches, you need to make that sharding aware too (perform the
intersections/unions across multiple machines, pull results from all of them,
etc.). It will be quite a bit of work in the short-term, but as your search
index gets larger than your current production box, it may pay off very well.
Also, there is always bare Lucene (which looks to be faster than Sphinx:
http://ai-cafe.blogspot.com/2009/08/lucene-vs-sphinx-showdown-on-large.html),
and Xapian.
Original comment by josiah.c...@gmail.com
on 27 Jul 2011 at 6:52
Hi
I did not checked the redis source code, but what i mean is following:
if (!key_exists(a))
return get_remote_key(a);
...
for each redis you can configure, single additional redis. similar to
replication, but kind of backwards :)
Original comment by nikolay....@gmail.com
on 28 Jul 2011 at 6:49
It looks simple in pseudo-code, but it really isn't. This is typically
something that should be solved on the application side, by some kind of
coordinating process, since you know what the exact configuration of your
machine is, and where data should be stored and is when you need it. You can
check out the unstable branch of Redis, which features DUMP/RESTORE to get a
raw representation of keys and migrate them to other boxes.
As Josiah mentions: moving a set over the network in order to intersect it will
never be a fast operation. The best approach I can think of here is some kind
of scatter/gather algorithm that takes the partial intersection of sets that
live on 1 shard (map, if you will), migrate them to a single machine and
perform the final intersection (reduce, if you will).
Closing because the original idea will not be added.
Original comment by pcnoordh...@gmail.com
on 28 Jul 2011 at 8:08
Original issue reported on code.google.com by
n...@photonhost.com
on 27 Jul 2011 at 6:49