Attumm / redis-dict

Python dictionary with Redis as backend, built for large datasets. Simplifies Redis operations for large-scale and distributed systems. Supports various data types, namespacing, pipelining, and expiration.
https://attumm.github.io/redis-dict/
MIT License
53 stars 14 forks source link

setdefault contains a race condition #60

Closed shughes-uk closed 1 month ago

shughes-uk commented 1 month ago

When concurrency is involved, the setdefault method is a race condition where it can overwrite the value set by another instance

        found, value = self._load(key)
        # <- other instance could update the key here
        if not found:
            # we just overwrote the other key
            self[key] = default_value
            return default_value
        return value

This is a useful function, but maybe worth noting in the docstring that using in distributed systems has this issue

Attumm commented 1 month ago

You're right, that's a valid observation.

It's an oversight that I missed. There is also another method that I was aware of with that issue pop. The fix is to use pipelining just like the other methods. It seems straightforward, but with distributed computing there are always edge cases.

Redis-Dict supports nested pipelining. Since Redis is single-threaded, pipelining allows us to send commands in one go, effectively removing the race condition.

There is a fix for both this MR. Will have to add some more rigorous tests next week, since we are dealing with distributed computing.

Thanks for your comment.

Attumm commented 1 month ago

During testing today, I discovered that adding the pipeline didn't resolve the issue. The timing issue persisted. While it's unlikely to be triggered in practice, since "setdefault" isn't widely known or used, we shouldn't leave race conditions in the code.

While reviewing Redis documentation, I found a better solution using Redis's SET command with the "GET" option, which performs an atomic operation to both set a value and retrieve the previous value. This approach eliminates the need for pipelining or batching entirely.

The implementation took some digging due to redis-py's quirks. When using SET with GET, you need to pass the rather cryptic {"get": True} as an argument to the redis execute_command method. Without this non-intuitive parameter, you'll get a boolean instead of the actual value. This occurs because the command parser's behavior is determined under normal conditions by the command, our case the "SET" command.

 result = redis.execute_command("SET", formatted_key, formatted_value, "NX", "GET", **{"get": True})

This has been a valuable addition to the project, and it helped me better understand both the Redis SET command's capabilities and the redis-py package's implementation details.

Reference: https://redis.io/docs/latest/commands/set/