Race condition in bulk replies?

sherbert commented 12 years ago

I think I am running into some sort of race condition in the hash get functions that receive bulk replies. When I make a request for a lot of data using either HMGET or HGETALL, I can get the following error:

Error in if (nchar(l) < 2) { : argument is of length zero
Calls: redisCmd ... replicate -> sapply -> lapply -> FUN -> .getResponse

I am running rredis 1.6.6 against Redis server version 2.4.10. The code I am running to test this is:

redisConnect('redis host');
a = redisCmd('HGETALL', 'test_redis_bug', raw=TRUE)
redisClose();

test_redis_bug is a hash containing 400 vectors of 51,698 doubles and a single vector of 51,698 character strings, all of which have been manually run through serialize() before storage (this is also why they are being retrieved in raw mode).

For small enough requests, I will never see this problem, while for large enough requests it happens every time. There is a middle ground where it happens inconsistently. “Large” seems to vary depending on the speed of the connection to the Redis server, but over our metro Ethernet (which runs at tens of megabits per second), I see consistent problems on this 400 x 50,000 example.

I did a little bit of poking around and when I see this error, l is character(0). If I read the next line, it becomes an empty string, and if I read the line after that I get "$<some length>", as expected.

I suspect a race condition because putting a Sys.sleep(1) at the top of .getResponse() fixes everything.

I was able to improve the situation by looping the l <- readLines(con=con, n=1) until it got something real, but I then less consistently see an error where readLines() ran out of material to read before reading an end-of-line, which will be followed by a garbled message on my next pull (which I theorize is data from the first request that didn’t get there in time).

It's been frustrating to try to figure out more because trying to look at anything that happens before the error seems to introduce enough delay that the race isn't triggered. Let me know if I can do anything to help in tracking this down.

bwlewis commented 12 years ago

Sorry about this.

I'm traveling back from a conference this weekend. I'll try to figure out what's going on as soon as possible and will surely ask you help testing the fix.

Best,

Bryan On May 18, 2012 6:37 PM, "sherbert" < reply@reply.github.com> wrote:

I think I am running into some sort of race condition in the hash get functions that receive bulk replies. When I make a request for a lot of data using either HMGET or HGETALL, I can get the following error:
Error in if (nchar(l) < 2) { : argument is of length zero
Calls: redisCmd ... replicate -> sapply -> lapply -> FUN -> .getResponse
I am running rredis 1.6.6 against Redis server version 2.4.10. The code I am running to test this is:
redisConnect('redis host');
a = redisCmd('HGETALL', 'test_redis_bug', raw=TRUE)
redisClose();
test_redis_bug is a hash containing 400 vectors of 51,698 doubles and a single vector of 51,698 character strings, all of which have been manually run through serialize() before storage (this is also why they are being retrieved in raw mode).

For small enough requests, I will never see this problem, while for large enough requests it happens every time. There is a middle ground where it happens inconsistently. “Large” seems to vary depending on the speed of the connection to the Redis server, but over our metro Ethernet (which runs at tens of megabits per second), I see consistent problems on this 400 x 50,000 example.

I did a little bit of poking around and when I see this error, l is character(0). If I read the next line, it becomes an empty string, and if I read the line after that I get "$<some length>", as expected.

I suspect a race condition because putting a Sys.sleep(1) at the top of .getResponse() fixes everything.

I was able to improve the situation by looping the l <- readLines(con=con, n=1) until it got something real, but I then less consistently see an error where readLines() ran out of material to read before reading an end-of-line, which will be followed by a garbled message on my next pull (which I theorize is data from the first request that didn’t get there in time).

It's been frustrating to try to figure out more because trying to look at anything that happens before the error seems to introduce enough delay that the race isn't triggered. Let me know if I can do anything to help in tracking this down.

Reply to this email directly or view it on GitHub: https://github.com/bwlewis/rredis/issues/5

sherbert commented 12 years ago

Sure, let me know if I can be of any support in debugging or tracking this down.

sherbert commented 12 years ago

I just saw that 1.6.7 was out, and as far as my testing shows, this issue is now fixed. I assume this is due to "Nonblocking connections in R are problematic, we switched to blocking mode." Will reopen if I manage to make this happen again, but things are looking good.

bwlewis / rredis

Race condition in bulk replies? #5