Closed lucas-aragno closed 2 years ago
We're also running into this issue and this PR resolves it.
After almost 2 weeks of testing this patch on production, we're running into infinite(look like) deep scan calls, Redis v4 and we had to roll it back.
irb(main):028:0> SidekiqAlive.registered_instances
/usr/src/app/vendor/bundle/ruby/3.0.0/gems/redis-4.5.1/lib/redis/connection/ruby.rb:36:in `read': stack level too deep (SystemStackError)
irb(main):029:1* module SidekiqAlive
irb(main):030:2* def self.deep_scan(keyword, keys = [], cursor = 0)
irb(main):031:2* next_cursor, found_keys = *redis { |r| r }.scan(cursor, match: keyword)
irb(main):032:2* keys += found_keys
irb(main):033:2* return keys if next_cursor == "0" || found_keys.blank?
irb(main):034:2* deep_scan(keyword, keys, next_cursor)
irb(main):035:1* end
irb(main):036:1*
irb(main):037:0> end
=> :deep_scan
irb(main):038:0> SidekiqAlive.registered_instances
=> []
This is because we have a ton of keys inside redis, and it's running out of stack.
Changing this into a loop, not a recursive call should resolve the issue. I'll be doing more tests on that.
@arathunku I actually found the same issue as you after testing this fix and came up with this (1000 worked for us but maybe you want to make it configurable):
loop do
cursor, found_keys = SidekiqAlive.redis.scan(cursor, match: keyword, count: 1000)
keys += found_keys
break if cursor.to_i == 0
end
keys
its working alright for us. I may update this PR with the fix today. I havent done that yet bc I haven't see any participation from mantainers tbh
lmk if that helps.
It will 100% help! Thanks for the snippet, I'll test it too, it shouldn't run out of Stack anymore :) I think the test instance I was running it on had over 60k keys
We're running on a custom fork with additional changes due to: https://github.com/arturictus/sidekiq_alive/pull/40#discussion_r752892803
If this works for you lmk and I'll update this PR and see if we can get it merged on master!
@lucas-aragno we've been using it on test and prod instances for over a week now, 0 problems :)
@arathunku perfect! ill update this pr soon.
Hi @lucas-aragno and @arathunku, I've been out for some time now, sorry for not answering what is the conclusion for this PR?
The code that is working for both is this one?
loop do
cursor, found_keys = SidekiqAlive.redis.scan(cursor, match: keyword, count: 1000)
keys += found_keys
break if cursor.to_i == 0
end
keys
Which means that this PR is invalid or has to be updated if I understand correctly
Thanks for the help
Hey @arturictus ! I still have to update this pr with that last code. I'll do that today and re-request the review. Thank you!
@arathunku @arturictus I just rebased and pushed the new changes. Could you folks take a look?
Looks OK to me, that's the version we've been using on prod since January. Thank you @lucas-aragno and @arturictus for helping with this
Great! then I think is up to @arturictus now :)
Thanks @lucas-aragno and @arathunku. Merging and releasing now
This PR is somewhat related to this issue. We were having this problem on our environments and we tried the approach described on this comment.
We found out that the
deep_scan
method doesn't work as expected.It looks like if it doesn't find it on the "first page" of the scan it returns empty:
(we added the
puts "===== next cursor #{next_cursor} found keys #{found_keys}"
to the method)since
next_cursor
wasn't "0" and the keys weren't found it just halted and returned[]
which is strange since it should re-start iffound_keys
isblank?
nevertheless I think the only condition for
deep_scan
should be that the cursor is"0"
if we base ourselves on theSCAN
method description from redis:besides that it seems to only return results if they are found on the first iteration, looks like it will also break if needs to perform further iterations. You can see this by updating a spec on
sidekiq_alive_spec.rb
this will generate some extra keys on redis before registering and looking up the instance. This fails with the following error
Because the key was after the first 10,
next_cursor == "0"
wasfalse
so it tried tofound_keys.blank?
of[]
and Arrays do not support the.blank?
method.So our solution was to take the
|| found_keys.blank?
altogether and seems to be performing well so far. I think that member of the condition introduces more issues than it fixes since most folks will get the same output if getting the results on the first page and this will better support multi paged scans.