disqus / nydus

Nydus is a Python toolkit for managing database connections and routing operations, primarily for Redis
Apache License 2.0
380 stars 38 forks source link

Redis ZUNIONSTORE Weirdness #29

Closed woozyking closed 11 years ago

woozyking commented 11 years ago

Hi,

I noticed this bug earlier today when I was playing with nydus to set up a Redis Cluster.

The base environment setup are:

My testing code looks like the this: (note: a is normal Redis connection using official Redis Python binding; b is a cluster returned by nydus.db.create_cluster())

import redis
from nydus.db import create_cluster

pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
a = redis.Redis(connection_pool=pool)
b = create_cluster({
    'backend': 'nydus.db.backends.redis.Redis',
    'router': 'nydus.db.routers.keyvalue.PartitionRouter',
    'hosts': {
        0: {'port': 63790, 'db': 0},
        1: {'port': 63790, 'db': 1},
        2: {'port': 63790, 'db': 2},
    }
})

a.flushdb()
b.flushdb()

a.zadd('test1', 'user1', 5)
b.zadd('test1', 'user1', 5)
a.zadd('test2', 'user1', 13)
b.zadd('test2', 'user1', 13)

a.zunionstore('test_sum', ('test1', 'test2'), 'SUM')
b.zunionstore('test_sum', ('test1', 'test2'), 'SUM')
a.zunionstore('test_min', ('test1', 'test2'), 'MIN')
b.zunionstore('test_min', ('test1', 'test2'), 'MIN')
a.zunionstore('test_max', ('test1', 'test2'), 'MAX')
b.zunionstore('test_max', ('test1', 'test2'), 'MAX')

assert a.zrange('test_sum', 0, -1, withscores=True) == b.zrange('test_sum', 0, -1, withscores=True)
assert a.zrange('test_min', 0, -1, withscores=True) == b.zrange('test_min', 0, -1, withscores=True)
assert a.zrange('test_max', 0, -1, withscores=True) == b.zrange('test_max', 0, -1, withscores=True)

I also tested with StrictRedis, but that did not change the matter (except that I had to change the way I called a.zadd)

dcramer commented 11 years ago

You can't expect random magical distributed queries to work :)

its likely that one key hashes to one node, and another to the other, so when you do a server side union (that is, zunionstore) its just executing it on one node.

We could special case zunionstore to say "route to all nodes" or we could just yell at a user for calling it.

On Tuesday, March 12, 2013 at 2:57 PM, oEL wrote:

Hi, I noticed this bug earlier today when I was playing with nydus to set up a Redis Cluster. The base environment setup are: Ubuntu 12.10 Python 2.7.3 Redis 2.6.10 redis-py (redis) 2.7.2 nydus 0.10.4
My testing code looks like the this: (note: a is normal Redis connection using official Redis Python binding; b is a cluster returned by nydus.db.create_cluster())
import redis from nydus.db import create_cluster pool = redis.ConnectionPool(host='localhost', port=6379, db=0) a = redis.Redis(connection_pool=pool) b = create_cluster({ 'backend': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db.routers.keyvalue.PartitionRouter', 'hosts': { 0: {'port': 63790, 'db': 0}, 1: {'port': 63790, 'db': 1}, 2: {'port': 63790, 'db': 2}, } }) a.flushdb() b.flushdb() a.zadd('test1', 'user1', 5) b.zadd('test1', 'user1', 5) a.zadd('test2', 'user2', 13) b.zadd('test2', 'user2', 13) a.zunionstore('test_sum', ('test1', 'test2'), 'SUM') b.zunionstore('test_sum', ('test1', 'test2'), 'SUM') a.zunionstore('test_min', ('test1', 'test2'), 'MIN') b.zunionstore('test_min', ('test1', 'test2'), 'MIN') a.zunionstore('test_max', ('test1', 'test2'), 'MAX') b.zunionstore('test_max', ('test1', 'test2'), 'MAX') assert a.zrange('test_sum', 0, -1, withscores=True) == b.zrange('test_sum', 0, -1, withscores=True) assert a.zrange('test_min', 0, -1, withscores=True) == b.zrange('test_min', 0, -1, withscores=True) assert a.zrange('test_max', 0, -1, withscores=True) == b.zrange('test_max', 0, -1, withscores=True)

I also tested with StrictRedis, but that did not change the matter (except that I had to change the way I called a.zadd)

— Reply to this email directly or view it on GitHub (https://github.com/disqus/nydus/issues/29).

woozyking commented 11 years ago

@dcramer I really though nydus auto-handles that case. Otherwise, is there any standard practice of doing what you have suggested? This might apply to more Redis commands, not just zunionstore, I suppose.

dcramer commented 11 years ago

Nydus supports distributed commands, but not things like zunionstore:

with cluster.map() as conn: results = [conn.zrange(k) for k in ['key1', 'key2']]

exit context to execute commands

print results

On Tuesday, March 12, 2013 at 3:02 PM, oEL wrote:

@dcramer (https://github.com/dcramer) I really though nydus auto-handles that case. Otherwise, is there any standard practice of doing what you have suggested? This might apply to more Redis commands, not just zunionstore, I suppose.

— Reply to this email directly or view it on GitHub (https://github.com/disqus/nydus/issues/29#issuecomment-14807427).

woozyking commented 11 years ago

So currently this is not possible? Or how do I route to all notes? Does that mean I have to really fetch the key using zrange and do a in Python SUM/MAX/MIN or such operation and store it back to a specific key using zadd again?

dcramer commented 11 years ago

I'm not sure if its documented, but there's an API to get connections. Its conn.get_db(num) I believe.

Otherwise the alternative is to customize a Redis router which changes routing behavior on zunionstore so it checks all args and only sends the keys to the correct nodes (or just all nodes).

On Tuesday, March 12, 2013 at 3:10 PM, oEL wrote:

So currently this is not possible? Or how do I route to all notes? Does that mean I have to really fetch the key using zrange and do a in Python SUM/MAX/MIN or such operation and store it back to a specific key using zadd again?

— Reply to this email directly or view it on GitHub (https://github.com/disqus/nydus/issues/29#issuecomment-14808223).

woozyking commented 11 years ago

Thanks @dcramer for all the help. I believe it's BaseCluster.get_conn. But the problem here is that I wouldn't know where a specific key (in this case sorted set) is stored, hence no way to know which connection to get. Unless there's a function/method that retrieves the connection based on the key provided.

So as of now, not an optimal solution, but we'll probably just do a map() zrange and load them into Python to do sum and such, then store back using conn.zadd.

dcramer commented 11 years ago

I'd definitely recommend just doing that, as no matter what you're going to have to route all gets/ranges (or unions) on all applicable nodes and join their results in python anyways.

On Tuesday, March 12, 2013 at 3:24 PM, oEL wrote:

Thanks @dcramer (https://github.com/dcramer) for all the help. I believe it's BaseCluster.get_conn. But the problem here is that I wouldn't know where a specific key (in this case sorted set) is stored, hence no way to know which connection to get. Unless there's a function/method that retrieves the connection based on the key provided. So as of now, not optimal solution, but we'll probably just do a map() zrange and load them into Python to do sum and such, then store back using conn.zadd.

— Reply to this email directly or view it on GitHub (https://github.com/disqus/nydus/issues/29#issuecomment-14809467).

woozyking commented 11 years ago

Thank you! This is solves a lot of problems here.

But aren't we all expect official redis cluster to come out ;)

ÔÚ 2013-03-12£¬6:28 PM£¬David Cramer notifications@github.com дµÀ£º

I'd definitely recommend just doing that, as no matter what you're going to have to route all gets/ranges (or unions) on all applicable nodes and join their results in python anyways.

On Tuesday, March 12, 2013 at 3:24 PM, oEL wrote:

Thanks @dcramer (https://github.com/dcramer) for all the help. I believe it's BaseCluster.get_conn. But the problem here is that I wouldn't know where a specific key (in this case sorted set) is stored, hence no way to know which connection to get. Unless there's a function/method that retrieves the connection based on the key provided. So as of now, not optimal solution, but we'll probably just do a map() zrange and load them into Python to do sum and such, then store back using conn.zadd.

¡ª Reply to this email directly or view it on GitHub (https://github.com/disqus/nydus/issues/29#issuecomment-14809467).

¡ª Reply to this email directly or view it on GitHub.

woozyking commented 11 years ago

I pulled off a solution with the aforementioned methodology. Though the algorithm could be largely optimized, see: https://gist.github.com/woozyking/07db9ec24fd31ea6e65c

Hope this could be useful for anyone who has the same implementation needs.