Closed adiholden closed 1 day ago
The problem is with running scan command under script. scan is not transactional and uses EngineShardSet Await to run the scanop on a shard.
If you run dragonfly with the following flags ./dragonfly --alsologtostderr --dbfilename= --proactor_threads=6 --vmodule=generic_family=1 you can see that the scanop is getting deadlocked when trying to run the scanop on the shard the script was scheduled on (the shard of the key provided for the script)
We can reproduce the bug also with this simple script
eval "return redis.call('scan', '0', 'match', 'key:', 'count', '1000')" 1 key:
Should we just refuse running SCAN in the script (return error) ?
Another option is to support patterns that have specific hashslot inside like {foo}
. In that case if we identify that scan is single sharded we can continue running it in the script under lock_on_hashtag mode
Should we just refuse running SCAN in the script (return error) ?
I assume this is a very common usecase
How to reproduce:
Run the following commands in redis cli:
script load " local pattern = KEYS[1]\n local oldest = tonumber(ARGV[1])\n local cursor = ARGV[2]\n local count = tonumber(ARGV[3])\n local result = redis.call(\"SCAN\", cursor, \"MATCH\", pattern, \"COUNT\", count)\n cursor = result[1]\n local keys = result[2]\n for i, key in ipairs(keys) do\n local to_delete = {}\n for _, timestamp in ipairs(redis.call('hkeys', key)) do\n if tonumber(timestamp) <= oldest then\n table.insert(to_delete, timestamp)\n end\n end\n if #to_delete > 0 then\n redis.call('hdel', key, unpack(to_delete))\n end\n end\n return cursor\n"
step 3 does not return causing transaction getting stuck