bolshakov / stoplight

:traffic_light: Traffic control for code.
http://bolshakov.github.io/stoplight/
MIT License
381 stars 40 forks source link

Possible invalid start and end values sent to Redis for ZRANGE #213

Closed mt-clearhaus closed 1 year ago

mt-clearhaus commented 1 year ago

Describe the bug Both Redis 6.2.6 (via Docker image) and AWS ElastiCache 6.0.5 complain when the strings Infinity and -Infinity are used as start or stop value for ZRANGE. For Stoplight 4.1.0 this is the case in Stoplight::DataStore::Redis#query_failures:

https://github.com/bolshakov/stoplight/blob/717936c742c32aed9c887d6615fd65b451ffe9f3/lib/stoplight/data_store/redis.rb#L139

If no custom window size is set for a light, the default Float::INFINITY will be used, and the start and stop values for ZRANGE will become the strings Infinity and -Infinity, respectively. This is because Time.now.to_i - Float::INFINITY equals -Infinity, and because RedisClient's command builder calls #to_s on Floats:

https://github.com/redis-rb/redis-client/blob/cfbe7feeebe696ee6f5c9a0a27a99bd792b67430/lib/redis_client/command_builder.rb#L34-L35

The error I see is this:

ERR value is not an integer or out of range

I'm using Ruby 3.1.2p20 on Debian Bookworm.

Redis MONITOR gives me this (the light's name is "primary"):

1693217647.398773 [0 10.0.35.172:45642] "multi"
1693217647.398791 [0 10.0.35.172:45642] "zrange" "stoplight:v4:failures:primary" "Infinity" "-Infinity" "BYSCORE" "REV"
1693217647.398800 [0 10.0.35.172:45642] "hget" "stoplight:v4:states" "primary"
1693217647.398807 [0 10.0.35.172:45642] "exec"

To Reproduce

irb(main):020:0> redis = Redis.new(...)
irb(main):021:0> redis.zrange('stoplight:v4:failures:primary', 'Infinity', '-Infinity')
/var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client/connection_mixin.rb:35:in `call': ERR value is not an integer or out of range (Redis::CommandError)
        from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:235:in `block (2 levels) in call_v'
        from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client/middlewares.rb:16:in `call'
        from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:234:in `block in call_v'
        from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:639:in `ensure_connected'
        from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:233:in `call_v'    
        from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis/client.rb:90:in `call_v'             
        from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:167:in `block in send_command'
        from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:166:in `synchronize'
        from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:166:in `send_command'
        from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis/commands/sorted_sets.rb:394:in `zrange'
        from (irb):21:in `<main>'     
    from /var/lib/gems/3.1.0/gems/padrino-core-0.15.3/lib/padrino-core/cli/base.rb:43:in `console'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/command.rb:27:in `run'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/invocation.rb:127:in `invoke_command'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor.rb:392:in `dispatch'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/base.rb:485:in `start'
    ... 3 levels...
/var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client/connection_mixin.rb:35:in `call': ERR value is not an integer or out of range (RedisClient::CommandError)
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:235:in `block (2 levels) in call_v'
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client/middlewares.rb:16:in `call'
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:234:in `block in call_v'
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:639:in `ensure_connected'
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:233:in `call_v'
    from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis/client.rb:90:in `call_v'
    from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:167:in `block in send_command'
    from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:166:in `synchronize'
    from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis.rb:166:in `send_command'
    from /var/lib/gems/3.1.0/gems/redis-5.0.7/lib/redis/commands/sorted_sets.rb:394:in `zrange'
    from (irb):21:in `<main>'
    from /var/lib/gems/3.1.0/gems/padrino-core-0.15.3/lib/padrino-core/cli/base.rb:43:in `console'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/command.rb:27:in `run'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/invocation.rb:127:in `invoke_command'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor.rb:392:in `dispatch'
    from /var/lib/gems/3.1.0/gems/thor-1.2.2/lib/thor/base.rb:485:in `start'
    ... 3 levels...

Expected behavior I hope Float::INFINITY can be replaced with something else, however, I not sure what the right solution is.

Screenshots N/A

Environment (please complete the following information):

Additional context N/A

mt-clearhaus commented 1 year ago

FWIW, I currently use this workaround (monkey patch):

# Monkey patch two private methods in Stoplight::DataStore::Redis such that
# Stoplight does not end up sending the strings "Infinity" and "-Infinity" as
# start/min and stop/max values in Redis ZREMRANGEBYSCORE and ZRANGE commands.
# See:
# * https://github.com/bolshakov/stoplight/issues/213
# * https://redis.io/commands/zremrangebyscore/
# * https://redis.io/commands/zrange/
module Stoplight
  module DataStore
    class Redis
      private

      def remove_outdated_failures(light, time, transaction: @redis)
        failures_key = failures_key(light)

        # Remove all errors happened before the window start
        window_start = time.to_i - light.window_size
        window_start = 0 if window_start < 0
        transaction.zremrangebyscore(failures_key, 0, window_start)
        # Keep at most +light.threshold+ number of errors
        transaction.zremrangebyrank(failures_key, 0, -light.threshold - 1)
      end

      def query_failures(light, transaction: @redis)
        window_start = Time.now.to_i - light.window_size
        window_start = 0 if window_start < 0

        transaction.zrange(failures_key(light), 10_000_000_000, window_start, rev: true, by_score: true)
      end
    end
  end
end

MONITOR before:

1693231301.688236 [0 192.168.48.3:57864] "MULTI"
1693231301.688257 [0 192.168.48.3:57864] "zrange" "stoplight:v4:failures:primary" "Infinity" "-Infinity" "BYSCORE" "REV"
1693231301.688262 [0 192.168.48.3:57864] "hget" "stoplight:v4:states" "primary"
1693231301.688264 [0 192.168.48.3:57864] "EXEC"
1693231305.713619 [0 192.168.48.3:57864] "MULTI"
1693231305.713665 [0 192.168.48.3:57864] "zadd" "stoplight:v4:failures:primary" "1693231305" "{\"error\":{\"class\":\"Net::TCPClient::ConnectionFailure\",\"message\":\"[...]\"},\"time\":\"2023-08-28T14:01:45.712843713+00:00\"}"
1693231305.713734 [0 192.168.48.3:57864] "zremrangebyscore" "stoplight:v4:failures:primary" "0" "-Infinity"
1693231305.713747 [0 192.168.48.3:57864] "zremrangebyrank" "stoplight:v4:failures:primary" "0" "-6"
1693231305.713758 [0 192.168.48.3:57864] "zcard" "stoplight:v4:failures:primary"
1693231305.713766 [0 192.168.48.3:57864] "EXEC"

MONITOR after:

1693231214.596847 [0 192.168.48.3:57226] "MULTI"
1693231214.596867 [0 192.168.48.3:57226] "zrange" "stoplight:v4:failures:primary" "10000000000" "0" "BYSCORE" "REV"
1693231214.596872 [0 192.168.48.3:57226] "hget" "stoplight:v4:states" "primary"
1693231214.596875 [0 192.168.48.3:57226] "EXEC"
1693231218.622114 [0 192.168.48.3:57226] "MULTI"
1693231218.622158 [0 192.168.48.3:57226] "zadd" "stoplight:v4:failures:primary" "1693231218" "{\"error\":{\"class\":\"Net::TCPClient::ConnectionFailure\",\"message\":\"[...]\"},\"time\":\"2023-08-28T14:00:18.621338758+00:00\"}"
1693231218.622223 [0 192.168.48.3:57226] "zremrangebyscore" "stoplight:v4:failures:primary" "0" "0"
1693231218.622237 [0 192.168.48.3:57226] "zremrangebyrank" "stoplight:v4:failures:primary" "0" "-6"
1693231218.622246 [0 192.168.48.3:57226] "zcard" "stoplight:v4:failures:primary"
1693231218.622254 [0 192.168.48.3:57226] "EXEC"
mt-clearhaus commented 1 year ago

It appears the workaround isn't sufficient. I'm seeing ERR syntax error now. Just FYI.

Interestingly, I only see it towards AWS ElastiCache, not towards ordinary Redis.

Towards ElastiCache:

irb(main):033:0> redis.zrange('stoplight:v4:failures:primary', 10_000_000_000, 0, byscore: true, rev: true)
/var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client/connection_mixin.rb:35:in `call': ERR syntax error (Redis::CommandError)
    from /var/lib/gems/3.1.0/gems/redis-client-0.16.0/lib/redis_client.rb:235:in `block (2 levels) in call_v'
[...]

Towards ordinary Redis:

irb(main):012:0> pp redis.zrange('stoplight:v4:failures:primary', 10_000_000_000, 0, byscore: true, rev: true)
["{\"error\":{[...]},\"time\":\"2023-08-28T14:31:51.948117847+00:00\"}",
 "{\"error\":{[...]},\"time\":\"2023-08-28T14:01:45.712843713+00:00\"}",
 "{\"error\":{[...]},\"time\":\"2023-08-28T14:00:18.621338758+00:00\"}"]
bolshakov commented 1 year ago

Hey @mt-clearhaus 👋 Thank you for reporting this!

irb(main):020:0> redis = Redis.new(...)
irb(main):021:0> redis.zrange('stoplight:v4:failures:primary', 'Infinity', '-Infinity')

The snippet you provided as reproduction steps is not equivalent to what Stoplight sends to Redis. Stoplight searches by score, while you're searching by rank. Sets' rank indeed should be an integer, while the score value is a float (see)

This is the correct ruby command:

redis.zrange('stoplight:v4:failures:primary', 'Infinity', '-Infinity', by_score: true, rev: true)

it's translated to Redis command:

zrange stoplight:v4:failures:test Infinity -Infinity BYSCORE REV

The command above works with Redis 6.2.13 for sure (we run tests against it) and Redis 7.2.0.

May I ask you to test if these two commands work on your version of Redis with and without using the ElastiCache?:

zrange stoplight:v4:failures:test Infinity -Infinity BYSCORE REV
zrange stoplight:v4:failures:test Inf -Inf BYSCORE REV
bolshakov commented 1 year ago

Actually, we can safely replace -Infinity with zero and +Infinity with large enough number because the score represent a timestamp in this case.

bolshakov commented 1 year ago

I tested it here using Redis 6.2.6 and the specs pass https://github.com/bolshakov/stoplight/pull/214

mt-clearhaus commented 1 year ago

Hi @bolshakov,

Thank you very much for creating Stoplight and for getting back to me.

It turns out my issues are only present for AWS ElastiCache. The engine version is only 6.0.5.

I have primed both my local Redis 6.2.6 and the AWS ElastiCache with two failures in the primary light.

Things work flawlessly with the local Redis 6.2.6:

irb(main):001:0> redis = Redis.new([...])
=> #<Redis client v5.0.7 for redis://redis:6379/0>
irb(main):002:0> pp redis.zrange('stoplight:v4:failures:primary', 'Infinity', '-Infinity', by_score: true, rev: true)
["{\"error\":{[...]},\"time\":\"2023-08-29T07:20:13.185982997+00:00\"}",
 "{\"error\":{[...]},\"time\":\"2023-08-29T07:16:51.963196041+00:00\"}"]
irb(main):003:0> pp redis.zrange('stoplight:v4:failures:primary', 'Inf', '-Inf', by_score: true, rev: true)
["{\"error\":{[...]},\"time\":\"2023-08-29T07:20:13.185982997+00:00\"}",
 "{\"error\":{[...]},\"time\":\"2023-08-29T07:16:51.963196041+00:00\"}"]
127.0.0.1:6379> zrange stoplight:v4:failures:primary Infinity -Infinity BYSCORE REV
1) "{\"error\":{[...]},\"time\":\"2023-08-29T07:20:13.185982997+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T07:16:51.963196041+00:00\"}"

127.0.0.1:6379> zrange stoplight:v4:failures:primary Inf -Inf BYSCORE REV
1) "{\"error\":{[...]},\"time\":\"2023-08-29T07:20:13.185982997+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T07:16:51.963196041+00:00\"}"

It's another story with AWS ElastiCache:

irb(main):001:0> redis = Redis.new([...])
=> #<Redis client v5.0.7 for redis://[...].cache.amazonaws.com:6379/0>
irb(main):002:0> pp redis.zrange('stoplight:v4:failures:primary', 0, 100)
["{\"error\":{[...]},\"time\":\"2023-08-29T07:37:00.265572267+00:00\"}",
 "{\"error\":{[...]},\"time\":\"2023-08-29T07:38:00.749749018+00:00\"}"]

irb(main):003:0> redis.zrange('stoplight:v4:failures:primary', 'Infinity', '-Infinity', by_score: true, rev: true)
[...]/redis-client-0.16.0/lib/redis_client/connection_mixin.rb:35:in `call': ERR value is not an integer or out of range (Redis::CommandError)
irb(main):004:0> redis.zrange('stoplight:v4:failures:primary', 'Inf', '-Inf', by_score: true, rev: true)
[...]/redis-client-0.16.0/lib/redis_client/connection_mixin.rb:35:in `call': ERR value is not an integer or out of range (Redis::CommandError)
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary 0 100
1) "{\"error\":{[...]},\"time\":\"2023-08-29T07:37:00.265572267+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T07:38:00.749749018+00:00\"}"

[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary Infinity -Infinity BYSCORE REV
(error) ERR value is not an integer or out of range
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary Inf -Inf BYSCORE REV
(error) ERR value is not an integer or out of range
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary 10000000000 0 BYSCORE REV
(error) ERR syntax error

I will try to upgrade the engine version in AWS ElastiCache.

mt-clearhaus commented 1 year ago

I think it is safe to close this issue as it seems Stoplight is not to blame :slightly_smiling_face:

mt-clearhaus commented 1 year ago

AWS ElastiCache with engine version 6.2.6 has no issues with the queries :+1: :raised_hands:

[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary 0 100
1) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:28.659990873+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:35.481946177+00:00\"}"
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary Infinity -Infinity BYSCORE REV
1) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:35.481946177+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:28.659990873+00:00\"}"
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary Inf -Inf BYSCORE REV
1) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:35.481946177+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:28.659990873+00:00\"}"
[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary 10000000000 0 BYSCORE REV
1) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:35.481946177+00:00\"}"
2) "{\"error\":{[...]},\"time\":\"2023-08-29T08:35:28.659990873+00:00\"}"
bolshakov commented 1 year ago

Glad you managed to pinpoint the issue!

[...].cache.amazonaws.com:6379> zrange stoplight:v4:failures:primary 10000000000 0 BYSCORE REV (error) ERR syntax error

Right! In Redis 6.2.0, the ZRANGEBYSCORE was deprecated since they introduced the BYSCORE argument for ZRANGE command. It seems this version of AWS ElastiCache does not support the BYSCORE argument yet.

Please, keep me updated if you managed to run it with ElastiCache

mt-clearhaus commented 1 year ago

The syntax error quoted above was with ElastiCache engine version 6.0.5, not 6.2.6. I believe the issue is that the integer 10000000000 is too large. As shown in https://github.com/bolshakov/stoplight/issues/213#issuecomment-1697012198 it seems 6.2.6 supports BYSCORE with ZRANGE.

As 6.0.5 doesn't support integers (or scores in a sorted set? :thinking:) of that size, it wouldn't be possible to workaround (-)Inf(inity) using a large integer in the case with Stoplight where the score is seconds since Epoch. So, AFAICT, I couldn't avoid upgrading my ElastiCache engine version :slightly_smiling_face:

Edit: After some thought: You might be right that the syntax error was caused by the BYSCORE word and not the integer size. In any case it seems the word is accepted by ElastiCache with engine version 6.2.6.