Lachim / redis

Automatically exported from code.google.com/p/redis
2 stars 0 forks source link

[FEATURE REQUEST] Add ZDIFFSTORE, and non-STORE versions of ZDIFF/ZUNION/ZINTER #579

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Sets have SDIFF for computing set difference, and commands like SUNION for 
returning a computation "on the fly" without permanently storing anything on 
the server. I'd like these features for sorted sets, so that I can call 
ZDIFFSTORE to remove from sorted-set z1 any members which also occur in 
sorted-set z2.

Original issue reported on code.google.com by a...@malloys.org on 8 Jun 2011 at 11:45

GoogleCodeExporter commented 8 years ago
Non-storing variants of union and intersection of sorted sets will not be 
added, because the result of these operations holds too much information to 
give back in one pass. See this issue: 
http://code.google.com/p/redis/issues/detail?id=328

The rationale behind not having a ZDIFFSTORE is the lack of a *real* use case, 
what will happen with the scores etc. This has been discussed numerous times on 
the ML, you can find the search/archives on groups.google.com.

Cheers,
Pieter

Original comment by pcnoordh...@gmail.com on 14 Jun 2011 at 6:56

GoogleCodeExporter commented 8 years ago
Here is my use case for ZDIFFSTORE. Exactly the same as SDIFFSTORE.
I have a python client which stores objects of a given model in redis and also 
install indices for fields needing queries. Indices are redis sets or sorted 
sets depending on the type of model chosen (models may or may not need a 
default sorting with respect a model field).

Indices are used to perform queries and everything is fine for sets. Not so for 
sorted sets.
Here is an example I hope it clarifies my point:

class SportActivity(orm.StdNet):
    person = orm.SymbolField(index = True)   # this field creates a sorted set
    type = orm.SymbolField(index = True)     # this field creates a sorted set
    dt = orm.DateField()

    class Meta:
        sort_by = 'dt'  # this tells the model to zset rather than sets as indices.

If I perform the following query:

res = SportActivity.objects.filter(person = 'pippo').exclude(type = 'swimming')

This translates to the following redis operation

ZINTERSTORE(ids_zset,person_pippo_ids_zset,tempkey1) does the filter on 'pippo' 
bit
ZDIFFSTORE(tempkey1,type_swimming_ids_zset,result) does the exclude bit

person_pippo_ids_zset is a sorted set of ids of objects with person equal to 
'pippo'
type_swimming_ids_zsetis a sorted set of ids of objects with type equal to 
'swimming'

The scores in the ZDIFFSTORE will be the scores of tempkey1 since they are the 
only elements left in the result set.

Original comment by luca.sbardella on 17 Jun 2011 at 8:48

GoogleCodeExporter commented 8 years ago
If your scores in type_swimming_ids_zset are the same as person_pippo_ids_zset, 
then you could use ZUNIONSTORE(tempkey1, type_swimming_ids_zset, result, 
weights=[1, -1]), followed by a ZREMRANGEBYSCORE(result, '-inf', 0) to discard 
the entries.

Really though, the problem is as Salvatore mentioned: how do you handle scores? 
From the perspective of a plain "set", the answer is discard any entry in A 
that is in B for A-B. But if you use the "bag-set" semantic (of which zsets are 
commonly used for), then the answer is that you subtract the values (the ZUNION 
variant I provided above) and discard the values that fall outside the bag-set 
range.

If you really need to do set subtraction, you can always duplicate your data 
into a plain set, do your set difference there, then intersect with your zset. 
Using your provided syntax, that would look something like...

SDIFFSTORE(ids_set, type_swimming_ids_set, tempkey1)
ZINTERSTORE(tempkey1, person_pippo_ids_zset, result)

Original comment by josiah.c...@gmail.com on 17 Jun 2011 at 8:41

GoogleCodeExporter commented 8 years ago
Redis sorted sets are not multisets, therefore there is no problem in doing A-B.
Of course you can do A-B by doing other operations.
But that is not the point.

Original comment by luca.sbardella on 18 Jun 2011 at 8:45

GoogleCodeExporter commented 8 years ago
I have implemented a zdiffstore command, about 30 lines of code,
which can be found at

https://github.com/lsbardel/redis/blob/quantredis/src/t_zset.c

and wrote a brief explanation here
https://github.com/lsbardel/redis/blob/quantredis/extracommands.rst

By using the command *withscores* set to false, zsets are treated as standard 
sets (but ordered) and therefore the score plays no role in the difference. 
This variant is what I use with sorted indices described above.

If *withscore* set to true, the difference only occurs when the score is also 
matched.

Happy to send a pull request if interested.

Luca

Original comment by luca.sbardella on 20 Jun 2011 at 8:47

GoogleCodeExporter commented 8 years ago
Hmm, would doing MULTI STORE/query/DEL pattern against a slave screw things up? 
Effectively you only want to query here, and it would suck to not be able to 
run these queries against slaves because they mutate things. 

Original comment by realy...@gmail.com on 21 Jun 2011 at 9:56

GoogleCodeExporter commented 8 years ago
luca: Zsets can be used as a multiset. The score is how many of that item there 
are. I've used them this way before.

realy: You can mutate data on the slave without affecting the master or other 
slaves. You should keep mutations to temporary keys (for the sake of 
consistency), but there is nothing stopping you from doing anything to a slave 
that you can also do on the master.

Original comment by josiah.c...@gmail.com on 22 Jun 2011 at 12:19