Closed jgadbois closed 12 years ago
I implemented here, but it is not solving the problem:
https://github.com/jgadbois/leaderboard/commit/59b65ba37f39babbcc4b71b3c856da345478c7d9
I still get the -ERR DISCARD without MULTI error
Could calling the method in a tight loop be causing this problem? Maybe I need to wrap my whole method call in a transaction? I'm really confused as to why this is happening (doesn't happen locally, only on RedisToGo through Heroku)
This didn't seem to fix it - RedisToGo upgraded my Redis server to 2.4.x and that fixed the problem it seems
Glad that it was fixed with upgrading Redis. What was the version of Redis you were using where you could cause the issue? Was it pre-Redis 2.4.4?
Yes it was 2.2 I believe. I actually am still intermittently getting the error. I switched back to my branch and it seemed to start working again, but I don't know if adding the multi really helped or is just coincidental to my app restarting or something like that.
Is the error happening IN the members_from_score_range_in
method or is it happening down in the ranked_in_list_in
method? I can't see how it would be happening in the former until you drop down into the ranked_in_list_in
method.
My call is to members_from_score_range
, my stack trace doesn't give me the line number in the leaderboard gem
Right, members_from_score_range
will drop into members_from_score_range_in
. Shame that you don't have more of a stacktrace.
Yeah, it's frustrating.
You don't see a more full stacktrace when running heroku logs -t
. If you get a chance and you can isolate a small test program, might be useful for all parties, Redis, Redis ruby client and us.
Yeah, I got my trace from tailing the heroku logs: Here's my method causing the issue that could be run standalone (although it doesn't break on my machine)
def get_rank_for leaderboard, metric, record
range = 50
first = true
if metric
if(record.nil? || metric >= record)
return 1
else
min = metric - range
max = metric + range
while (min > 0 || max < record || first)
if(leaderboard.total_members_in_score_range(min, max) > 0)
members = leaderboard.members_from_score_range(min, max)
members.each do |m|
if m[:score] <= metric
return m[:rank]
end
end
return leaderboard.total_members + 1
end
min = min - range
max = max + range
first = false
end
end
end
end
Could something go wrong if there is a huge number of members returned in a given range?
I guess something could go wrong in terms of network connectivity breaking between Heroku and RedisToGo if there are a large number of results being returned, but that'd fall in the realm of the Ruby client. Are you able to run some range tests to see over a given range how that might be causing the issue? Does the error even happen at all if you run the test over and over for a small min and max?
I just tried running my method in a tight loop 1000 times with 10k members in redis and didn't have any problems. Not sure what I did would have tripped anything, but I loaded them with score = i/5.0 so a range of 100 would return 500 members.
Thanks for digging in further. Let me know if you see any odd behavior with a larger range. I've been able to submit 1,000,000 scores to the rank_members
method, which does a transaction, but that was local testing.
I got some full stack traces in airbrake (unfortunately that means it's blowing up on production)
vendor/bundle/ruby/1.9.1/gems/redis-2.1.1/lib/redis.rb:565:in `multi'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:673:in `ranked_in_list_in'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:531:in `leaders_in'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:598:in `member_at_in'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:580:in `member_at'
lib/records/leaderboard_manager.rb:46:in `record'
app/models/workout_exercise.rb:84:in `sitewide_record?'
app/models/workout_exercise_set.rb:14:in `sitewide_record?'
vendor/bundle/ruby/1.9.1/gems/redis-2.1.1/lib/redis.rb:568:in `rescue in multi'
vendor/bundle/ruby/1.9.1/gems/redis-2.1.1/lib/redis.rb:565:in `multi'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:673:in `ranked_in_list_in'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:567:in `members_from_score_range_in'
vendor/bundle/ruby/1.9.1/gems/leaderboard-2.2.1/lib/leaderboard.rb:547:in `members_from_score_range'
lib/records/leaderboard_manager.rb:106:in `get_rank_for'
lib/records/leaderboard_manager.rb:85:in `workout_rank'
app/models/workout.rb:83:in `sitewide_rank'
Any ideas? Happy to do anything to help. I have pages blowing up all over now (my own fault for not testing more under load)
Any chance you could try using version 2.2.2 of the redis gem? It looks like there are some changes with multi/exec that may help here with this issue in that it should not raise any exceptions. I think the error here though is within Redis or the connection to Redis.
will do - actually just working on that as we speak, thanks
My app only has one REDIS connection - could that be causing a problem if there are a lot of users on?
We run with single connections to Redis in our production environment. But here it looks like here an error is being thrown by Redis in retrieving the data. I could be wrong.
Did updating to version 2.2.2 of the redis gem help any?
I deployed to production about 10 minutes ago and so far yes that seems to be helping. Not enough data yet though. Another potential cause is maybe if Rack::Timeout kicks in and kills the request while Redis is processing data? With the old client the page did seem to be loading abnormally slow - seems to be faster with the newer client version
What is Rack::Timeout set to? In our production environment, we use a timeout of 30 seconds for Unicorn. This could certainly be causing an issue if the Rack::Timeout value is set too small. Obviously if were pulling a large amount of data from Redis, there could be a timeout, but I'm not sure how much you're pulling via the members_from_score_range_in
method.
Do you have a connection timeout set in redis.conf? By default, I believe it is set to 0 to be disabled.
Looks like redistogo sets timeout to 150
Our Rack::Timeout is also set to 30 seconds but it was going super slow even when I removed the members_from_score_range_in loop to get rank.
I'll keep digging, sorry this is getting somewhat off topic...I appreciate the help.
Sure, it's not a problem. Happy to help try and diagnose the issue. Sorry that it has to be in your production environment, but it'd be tough to replicate the environment to run any tests.
No news is good news? Any update?
I think upgrading the client fixed it. Not seeing the error anymore so far. It was still kind of slow (my algorithm?) so I added some fragment_caching to minimize hits to redis on this particular page. Overall I think it is going to work nicely. I have a leaderboard per exercise in our database (~1000) so should get some good use out of the gem :)
Thanks again and thanks for releasing this gem.
Good to hear. The differences between versions here would indicate that's the issue:
2.1.1 - https://github.com/ezmobius/redis-rb/blob/c60d5ab5aa2257e890cb047c69f0171db983a93b/lib/redis.rb#L565-570 2.2.2 - https://github.com/redis/redis-rb/blob/be3d7e0b4194fcb7e91bee9153daf6073dbc1268/lib/redis.rb#L1013-1017
That's my guess.
You might also get some performance gains here by utilizing the hiredis driver.
That definitely looks like the culprit although not the root issue I guess. I'll check out the hiredis driver - thanks for pointing that out.
I'm getting an error on Heroku when using the members from score_range in method in a loop. Here's what RedisToGo support said: