Pydes-boop commented 8 months ago

Description

When using redis-dict and the delete keyword, a keyerror is not thrown when the key does not exist in the redis dictionary. This should be default behaviour as specified by pythons dictionary implementation.

The test for deletion do not check for this either, the following implementation shows the error behaviour:

def test_delete(self):
    """Test deleting a key."""
    self.r['foobargone'] = 'bars'

    del self.r['foobargone']

    self.assertEqual(self.redisdb.get('foobargone'), None)

    # expected standard dictionary behaviour
    # Keyerror is raised
    with self.assertRaises(expected_exception=KeyError):
        d = dict()
        del d['foobargone']

    # incorrect redis_dict behaviour
    # Keyerror not raised
    with self.assertRaises(expected_exception=KeyError):
        del self.r['foobargone']

Proposal

As currently redis dict only forwards the delete call to redis here the most straightforward solution would be to check for existence in redis with an additional call, however that would mean an additional redis call for every delete.

Otherwise the call to redis-py seems to already return a bitstring of 0 or 1 depending on if the delete was succesfull, so maybe we could simply catch this and use it as a return value? (Im a little bit unclear on the redis-py documentation here, but testing and simply printing out the return seems to confirm this behaviour).

I would love some feedback on this, as i suggest simply expanding the behaviour of delitem to catch the return from redis and throw a keyerror dependent on that return, as well as add one additional test specifically checking for this behaviour. I could propose a PR later today.

Impact / Additional Thoughts

multi_delete is also not concerned about this. Should this also be expanded to throw a keyerror in case of deleting items which do not exist? Otherwise i dont think there would be any other functionality that is impacted by this proposal.

Attumm commented 8 months ago

Great find. If it's not the same as the Dictionary, it is something that should be changed.

But at the same time, some people will start experiencing race conditions that they haven't encountered before, such as what if server a and b delete a value instead of using .pop(value, None)? At that moment, one of the programs will encounter a KeyError. However, the library is called redis-dict, and then that should be philosophy. Therefore, we could document .pop(value, None) in the delete docstring, maybe should make a new method that has the old behaviour? I'm not sure, I'll have to ponder about it a bit longer.

Best-case scenario: If we get 0, 1 one back from the delete, we can use it to determine whether we need to throw an error.

Worst-case scenario: We have to make 2 calls to redis. That would mean 2 times the network cost. But for that, we could use the pipeline; the pipeline can handle multiple calls. But we don't want to get the actual data when we check the key since the data underneath could be large. Thus, we would still pay a networking cost.

Multi_delete is meant to work with other multi methods. You make a good point, but multi delete is best left as it currently is because the multi methods are used for nesting, it might have consequences that are not easy to spot, such as race conditions.

Pydes-boop commented 8 months ago

hm im not sure if a new method for the old behaviour is the right call, as we already have a method that supports the old behaviour .pop(value, None). Additionally as we are dealing with built-ins here it wouldnt be too preferable for the user anyways to provide an additional method, as they would have to change their call from del dict[key] to dict.functioncall(args) anyways in which case they could simply use pop.

Otherwise i definitly see how this could suddenly introduce errors to preexisting projects using this library, one way to eleviate this potential nuissance for users could be to provide a bool that can be set for redis-dict and turns the old behaviour back on? But this also is not a very pretty solution.

I think in the case that an additional call or increased network cost would be required i would simply document this behaviour that differs from dictionary explicitly, in the case that we can simply use the redis delete call to determine if we should throw an error a behaviour change should be introduced (and this latter case seems to be the most likely from my limited testing)

Attumm commented 2 weeks ago

As we are approaching the end of the year, I went over the open issues again. It seemed like a great time to implement what we have discussed this year. :) RedisDict will get an opt-in flag for strict dictionary behavior With that in place using your suggestion made it trivial to fix the KeyError on deletion issue. Thanks.

To illustrate the core of the issue. When looking at the code in isolation, both scripts are almost identical and would work within a single dictionary context, both shouldn't raise alarms during a code review.

# System 1
d["foo"] = "bar_one"
del d["foo"]  

# System 2  
d["foo"] = "bar_two"
del d["foo"]

But in a distributed environment when these operations run on different machines, the Redis instance could receive operations in this sequence:

Operation sequence:
set "foo" "bar_one"   
set "foo" "bar_two"   
del "foo"  -- returns 1 (key was found and deleted)
del "foo"  -- returns 0 (key not found, already deleted)

From a user perspective, they try to delete data under key, both scripts are at the same state in the end.

But if we would throw the error one app will get the KeyError from Redis since the key is not found anymore (it was already deleted by the other system).

This shows why raising KeyError could be a issue even though both scripts are valid, in a distributed environment one could fail due to the timing of operations at the Redis level.

Raising a KeyError would force users to handle an error for achieving a state that is already present. the key not being there. This is especially problematic with shared storage where identical code running on different systems should achieve the same result.

For users who specifically want Python dictionary behavior, we'll implement this under the opt-in feature: dict_compliant

Users who want strict local dictionary behavior can enable it with:

dic = RedisDict(dict_compliant=True)

This maintains the current practical behavior by default while providing the option for strict dictionary compliance when needed. And we can build the other features under the opt-in flag.

Attumm / redis-dict

delete does not throw keyerror #46

Description

Proposal

Impact / Additional Thoughts