card323 / redis

Automatically exported from code.google.com/p/redis
0 stars 0 forks source link

Redis hits Assertion Failed with Jedis 2.0.0 and another client in brpoplpush loop #593

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
2.2.{9,10} on OSX 10.6.6, from Brew. Using Jedis 2.0.0 with Java 1.6 and 
python's redis 2.4.5 package. redis is started via redis-server with no 
explicit configuration

The java process does a BRPOPLPUSH via a JedisPool, and then a LPUSH to a new 
list and a LREM to the backup list (the second arg to BRPOPLPUSH) before 
looping. The python process does this:

>>> while True:
...   r.brpoplpush('glass-out', 'glass-in')
... 
'DATA'
'DATA'
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "build/bdist.macosx-10.6-universal/egg/redis/client.py", line 680, in brpoplpush
  File "build/bdist.macosx-10.6-universal/egg/redis/client.py", line 254, in execute_command
  File "build/bdist.macosx-10.6-universal/egg/redis/client.py", line 242, in _execute_command
  File "build/bdist.macosx-10.6-universal/egg/redis/connection.py", line 54, in send
  File "build/bdist.macosx-10.6-universal/egg/redis/connection.py", line 35, in connect
redis.exceptions.ConnectionError: Error 61 connecting localhost:6379. 
Connection refused.

The crash happens quickly, and redis crashes with this output:

[34096] 22 Jun 15:04:06 # === ASSERTION FAILED ===
[34096] 22 Jun 15:04:06 # ==> networking.c:706 'c->argc == 0' is not true
[34096] 22 Jun 15:04:06 # (forcing SIGSEGV in order to print the stack trace)
[34096] 22 Jun 15:04:06 # ======= Ooops! Redis 2.2.10 got signal: -11- =======
[34096] 22 Jun 15:04:06 # redis_version:2.2.10
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:kqueue
process_id:34096
uptime_in_seconds:1173
uptime_in_days:0
lru_clock:854600
used_cpu_sys:0.17
used_cpu_user:0.12
used_cpu_sys_childrens:0.00
used_cpu_user_childrens:0.00
connected_clients:3
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:57
blocked_clients:0
used_memory:955744
used_memory_human:933.34K
used_memory_rss:1150976
mem_fragmentation_ratio:1.20
use_tcmalloc:0
loading:0
aof_enabled:0
changes_since_last_save:9
bgsave_in_progress:0
last_save_time:1308779073
bgrewriteaof_in_progress:0
total_connections_received:4
total_commands_processed:27
expired_keys:0
evicted_keys:0
keyspace_hits:5
keyspace_misses:13
hash_max_zipmap_entries:512
hash_max_zipmap_value:64
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
allocation_stats:6=1,8=16,9=10,10=13,11=7,12=5,13=49,14=35,15=32,16=10187,17=24,
18=13,19=20,20=6,21=3,22=2,23=1,24=191,25=2,26=
[34096] 22 Jun 15:04:06 # 1   redis-server                        
0x000000010002b36f 0x0 + 4295144303
[34096] 22 Jun 15:04:06 # 2   ???                                 
0x0000000000020000 0x0 + 131072
[34096] 22 Jun 15:04:06 # 3   redis-server                        
0x0000000100010e63 0x0 + 4295036515
[34096] 22 Jun 15:04:06 # 4   redis-server                        
0x00000001000111e2 0x0 + 4295037410
[34096] 22 Jun 15:04:06 # 5   redis-server                        
0x00000001000112e0 0x0 + 4295037664
[34096] 22 Jun 15:04:06 # 6   redis-server                        
0x0000000100001871 0x0 + 4294973553
[34096] 22 Jun 15:04:06 # 7   redis-server                        
0x0000000100001b7e 0x0 + 4294974334
[34096] 22 Jun 15:04:06 # 8   redis-server                        
0x00000001000073c8 0x0 + 4294996936
[34096] 22 Jun 15:04:06 # 9   redis-server                        
0x00000001000008c4 0x0 + 4294969540

I was unable to repro by creating two python processes doing brpoplpush in 
opposite directions on the same two lists, so I think Jedis is handling things 
poorly. I'm also filing a bug with them.

Original issue reported on code.google.com by kevin.clark@gmail.com on 22 Jun 2011 at 10:16

GoogleCodeExporter commented 8 years ago
Thank you for this report. Can you be more specific about the commands you 
execute on the Java side? In particular the key names and sequence of commands 
(when are the BRPOPLPUSH operations unblocked, when are the LPUSH operations 
called) will be helpful in tracking down how this assert can fail. I've looked 
into this for a bit, but was unable to find a sequence to reproduce this issue.

Original comment by pcnoordh...@gmail.com on 6 Jul 2011 at 10:54

GoogleCodeExporter commented 8 years ago
If you could also send us a script to reproduce that we will be able to fix it 
much faster, thanks!

Original comment by anti...@gmail.com on 6 Jul 2011 at 11:08

GoogleCodeExporter commented 8 years ago
I've encountered similar bug, tracked it down to some point and wrote script to 
reproduce it. The script is attached to that comment and also available at 
https://gist.github.com/1188698

Think I just wrote it down here:

--------------------
#!/bin/sh

src/redis-cli -r 10000 LPUSH a data &
src/redis-cli -r 10000 BRPOPLPUSH a b 0 &
src/redis-cli -r 10000 BRPOPLPUSH b c 0 &

wait
--------------------

So, to reproduce that bug you need to repeat in 3 concurrent loops the 
following commands:

LPUSH a data
BRPOPLPUSH a b 0
BRPOPLPUSH b c 0

At some point in time, usually about 1 seconds since script start Redis stops 
with ASSERTION FAILED message. Interesting thing is that sometimes the bug just 
does not arise and script works well. In that case I restart Redis server for 
some time and eventually the bug will appears again.

I am very worried about this error, and hope you can find the time to fix it.

Linux foo 2.6.39-0-generic #5~20110427-Ubuntu SMP Wed Apr 27 17:41:08 UTC 2011 
i686 i686 i386 GNU/Linux

Intel i3

Redis output
------------
foo@foo:/opt/redis-2.2.10$ ./src/redis-server
[1029] 02 Sep 23:28:06 # Warning: no config file specified, using the default 
config. In order to specify a config file use 'redis-server /path/to/redis.conf'
[1029] 02 Sep 23:28:06 * Server started, Redis version 2.2.10
[1029] 02 Sep 23:28:06 # WARNING overcommit_memory is set to 0! Background save 
may fail under low memory condition. To fix this issue add 
'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the 
command 'sysctl vm.overcommit_memory=1' for this to take effect.
[1029] 02 Sep 23:28:06 * The server is now ready to accept connections on port 
6379
[1029] 02 Sep 23:28:06 - 0 clients connected (0 slaves), 539480 bytes in use
[1029] 02 Sep 23:28:08 - Accepted 127.0.0.1:34708
[1029] 02 Sep 23:28:08 - Accepted 127.0.0.1:34709
[1029] 02 Sep 23:28:08 - Accepted 127.0.0.1:34710
[1029] 02 Sep 23:28:08 - Accepted 127.0.0.1:34711
[1029] 02 Sep 23:28:08 - Accepted 127.0.0.1:34712
[1029] 02 Sep 23:28:08 # === ASSERTION FAILED ===
[1029] 02 Sep 23:28:08 # ==> networking.c:706 'c->argc == 0' is not true
[1029] 02 Sep 23:28:08 # (forcing SIGSEGV in order to print the stack trace)
[1029] 02 Sep 23:28:08 # ======= Ooops! Redis 2.2.10 got signal: -11- =======
[1029] 02 Sep 23:28:08 # redis_version:2.2.10
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:32
multiplexing_api:epoll
process_id:1029
uptime_in_seconds:2
uptime_in_days:0
lru_clock:1473944
used_cpu_sys:0.00
used_cpu_user:0.00
used_cpu_sys_childrens:0.00
used_cpu_user_childrens:0.00
connected_clients:5
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:42
blocked_clients:0
used_memory:579400
used_memory_human:565.82K
used_memory_rss:1220608
mem_fragmentation_ratio:2.11
use_tcmalloc:0
loading:0
aof_enabled:0
changes_since_last_save:16
bgsave_in_progress:0
last_save_time:1314973686
bgrewriteaof_in_progress:0
total_connections_received:5
total_commands_processed:15
expired_keys:0
evicted_keys:0
keyspace_hits:15
keyspace_misses:14
hash_max_zipmap_entries:512
hash_max_zipmap_value:64
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
allocation_stats:4=4,6=1,8=5,9=9,10=42,11=11,12=10272,13=45,14=33,15=30,16=37,17
=16,18=10,19=23,20=6,21=2,22=2,23=2,24=32,25=2,26
[1029] 02 Sep 23:28:08 # ./src/redis-server(_redisAssert+0x57) [0x8074067]
[1029] 02 Sep 23:28:08 # ./src/redis-server(processMultibulkBuffer+0x270) 
[0x805d1c0]
[1029] 02 Sep 23:28:08 # ./src/redis-server(processInputBuffer+0x9a) [0x805d2ba]
[1029] 02 Sep 23:28:08 # ./src/redis-server(readQueryFromClient+0x70) 
[0x805d370]
[1029] 02 Sep 23:28:08 # ./src/redis-server(aeProcessEvents+0x12e) [0x8050f0e]
[1029] 02 Sep 23:28:08 # ./src/redis-server(aeMain+0x37) [0x80512a7]
[1029] 02 Sep 23:28:08 # ./src/redis-server(main+0x111) [0x8056491]
[1029] 02 Sep 23:28:08 # /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xe7) 
[0xcd5e37]
[1029] 02 Sep 23:28:08 # ./src/redis-server() [0x80502f1]
Segmentation fault

Original comment by senotru...@gmail.com on 2 Sep 2011 at 2:37

Attachments:

GoogleCodeExporter commented 8 years ago
Hello, I can confirm that:

1) The script is actually able to reproduce the bug.
2) That the bug is still present in Redis 2.2.12.

I'm leaving for honeymoon but this is a critical stuff. Pieter: are you working 
this week? Otherwise I can care about it as this looks not too hard to fix even 
from an hotel ;)

Thank you very much for discovering this... and especially for providing a 
trigger script.

Original comment by anti...@gmail.com on 3 Sep 2011 at 9:42

GoogleCodeExporter commented 8 years ago
I forgot to add that the bug is present into unstable as well.

Original comment by anti...@gmail.com on 3 Sep 2011 at 9:43

GoogleCodeExporter commented 8 years ago
The following script also triggers the bug:
-----------------------------
#!/bin/sh

./redis-cli -r 2 BRPOPLPUSH a b 0 &
./redis-cli -r 1 BRPOPLPUSH b c 0 &
sleep 1
./redis-cli -r 1 LPUSH a data &

wait
-----------------------------

It shows even more clearly what is going on. The problem is a small bug that is 
triggered when you chain BRPOPLPUSH with other blocking commands like above.

The following change appears to be a small and clean fix for this: 
https://github.com/hampus/redis/commit/c47d152c8d96415de1af994b1a4bb3e0347caef3.
 It's probably just a typo that hasn't been discovered before.

I may have missed something, because I haven't been able to put much time into 
this today. Neither of the scripts trigger any assertions anymore and all the 
tests work, at least. Pieter, perhaps you could have a look at the patch?

Cheers,
Hampus Wessman

Original comment by hampus.w...@gmail.com on 7 Sep 2011 at 5:30

GoogleCodeExporter commented 8 years ago
Thank you Hampus, finally monday I'll be back at work and will check/merge this 
fix as well.

Original comment by anti...@gmail.com on 10 Sep 2011 at 11:44

GoogleCodeExporter commented 8 years ago
Have a nice time until then, Salvatore!

Original comment by hampus.w...@gmail.com on 10 Sep 2011 at 12:13

GoogleCodeExporter commented 8 years ago
Thanks Hampus! Back finally :) Definitely a typo, helped by the fact that "c" 
and "origclient" are lame names...

Btw now that is merged, writing a test right now!

Salvatore

Original comment by anti...@gmail.com on 12 Sep 2011 at 8:33

GoogleCodeExporter commented 8 years ago
regression added to 2.4 and unstable, closing the issue, thanks everybody.

We'll release a new 2.4 RC and 2.2.13 containing this fix today.

Salvatore

Original comment by anti...@gmail.com on 12 Sep 2011 at 8:57

GoogleCodeExporter commented 8 years ago
Thank you, Salvatore!

I'm glad I took part.

Stan.

Original comment by senotru...@gmail.com on 12 Sep 2011 at 9:01