StackExchange.Redis.RedisServerException: ERR Script attempted to access a non local key in a cluster node script

sohaibameenpk007 commented 1 year ago

Hello Everyone,

I started getting this error on production while clearing out one of the key data on Redis. It works fine on Stage but generated an error on PROD.

full_message 2023-07-10 13:55:17,673 - ERR Script attempted to access a non local key in a cluster node script: fea3235e8ce23bcab0e0ad87c68f38d7b03cda64, on @user_script:4.

StackExchange.Redis.RedisServerException: ERR Script attempted to access a non local key in a cluster node script: fea3235e8ce23bcab0e0ad87c68f38d7b03cda64, on @user_script:4.

at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in //src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2099 at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in //src/StackExchange.Redis/RedisDatabase.cs:line 1520 at Abp.Runtime.Caching.Redis.RedisDatabaseExtensions.KeyDeleteWithPrefix(IDatabase database, String prefix) at Abp.Runtime.Caching.AbpCacheBase.ClearAsync()

mgravell commented 1 year ago

Can we see the script, and how you invoke it please? In particular, what matters is how you are specifying and using keys: in cluster mode, they must be passed via the keys arg, not as values - and you must not generate or hard-code keys inside the script, that are not specified via keys; we need to use keys to a: route to the correct node, and b: assert that all keys are on the same shard. If you don't pass the keys via keys: we can't do this for you, and bad things may occur

mgravell commented 1 year ago

btw: if you're using the alternative script API where you pass in an object for the args rather than keys: make sure that parameters that are keys are typed as RedisKey - the library will then know what to do; if the same value is passed as a string, it will be passed via ARGV rather than KEYS, and will not contribute to routing; this is usually as simple as changing:

int id = ...
string key = ...
... new { id, key }

to either:

int id = ...
RedisKey key = ...
... new { id, key }

or

int id = ...
string key = ...
... new { id, key = (RedisKey)key }

mgravell commented 1 year ago

(additional observation; if your staging environment is not clustered, but your production is clustered: it will miss a wide range of cluster-specific scenarios that you probably want to validate against)

sohaibameenpk007 commented 1 year ago

@mgravell Staging environment has 2 cluster nodes and PROD has 3 so I feel it should have failed on STAGE as well but it did not.

sohaibameenpk007 commented 1 year ago

I am using this like this by passing a keycache to clear it from Redis but it throws an exception since it is trying to locate this key on a cluster node where it does not exist.

await _cacheManager.GetCache(keyCache).ClearAsync();

@mgravell can you explain how I effectively check if the key exists on redis or not?

mgravell commented 1 year ago

Staging environment has 2 cluster nodes and PROD has 3 so I feel it should have failed on STAGE as well but it did not.

That's fair; I'm surprised that this bit you, too

await _cacheManager.GetCache(keyCache).ClearAsync();

OK, but AFAIK that isn't any of the library code - I don't know what that is, so I can't comment; do you have any of the actual code that is calling into the script API? or are you using some other 3rd party library on top of SE.Redis?

can you explain how I effectively check if the key exists on redis or not?

db.KeyExists

mgravell commented 1 year ago

If you're using "ASP.NET Boilerplate": this seems like a question for them; I can only comment on things that directly use the API

mgravell commented 1 year ago

if you're using Abp.RedisCache, I'm going to recommend "oh god, don't use that"; my reasons:

https://github.com/aspnetboilerplate/aspnetboilerplate/blob/a5aa294489c720c15a90b35462a0e8654b377963/src/Abp.RedisCache/Runtime/Caching/Redis/RedisDatabaseExtensions.cs#L23C17-L27 and https://github.com/aspnetboilerplate/aspnetboilerplate/blob/a5aa294489c720c15a90b35462a0e8654b377963/src/Abp.RedisCache/Runtime/Caching/Redis/RedisDatabaseExtensions.cs#L42

a: the KEYS command is not suitable for production use (it will cause server stalls); it is a: replaced by SCAN (which works differently), and b: even then, SCAN is only suitable for database management purposes, not routine usage b: that code is not designed with "redis cluster" in mind:

it violates the "you must not generate [or removed] keys inside the script" - this isn't a SE.Redis limitation: this is a redis-cluster limitation
it only considers whatever random node you land on (KEYS and SCAN are node-specific operations, not database-wide operations)

mgravell commented 1 year ago

looking at the ClearAsync() path from, your stack-trace, that comes in here: https://github.com/aspnetboilerplate/aspnetboilerplate/blob/a5aa294489c720c15a90b35462a0e8654b377963/src/Abp.RedisCache/Runtime/Caching/Redis/AbpRedisCache.cs#L351 which calls into the code I opined on above - so... yeah, that's definitely the "why is this happening?"; honestly, this (Abp, which I've never heard of until today) is just bad code that doesn't understand a: the semantics of redis-cluster, or b: the impact of certain redis operations like keys

ismcagdas commented 1 year ago

@mgravell I'm one of the maintainers of ASP.NET Boilerplate. This feature was mostly a community contribution. Is it possible for you to redirect me a documenation so I can understand this concept better and enhance the Abp.RedisCache package ? Thanks.

mgravell commented 1 year ago

@ismcagdas for general cluster mode, the rules are as per https://redis.io/docs/management/scaling/#:~:text=Redis%20Cluster%20supports%20multiple%20key,a%20feature%20called%20hash%20tags. with comments specific to scripting here: https://redis.io/docs/interact/programmability/eval-intro#script-parameterization

the short version is: only a single "slot" can be used per command, although multi-key commands are still valid, this is harder and usually involves "hash tags" (see https://redis.io/docs/reference/cluster-spec#implemented-subset) - as for KEYS, that should almost never be used - SCAN is almost always preferred, although that guidance only really helps for non-scripting scenarios; for scripting... oof, honestly, I'd want to redesign this implementation to not ever need KEYS/SCAN (which can be done, but it isn't trivial)

Designing redis operations for use on cluster is trickier than vanilla redis because of the slot concerns; designs that rely on inter-related keys may need to be implemented differently in cluster. Hash tags can help, but it is easy to end up in a scenario where one shard takes all the load, making the cluster massively unbalanced

ismcagdas commented 1 year ago

@mgravell thank you very much for taking your time and sharing these info with me.

StackExchange / StackExchange.Redis

StackExchange.Redis.RedisServerException: ERR Script attempted to access a non local key in a cluster node script #2506