Open ewdevel opened 12 years ago
What platform are you running on?
How long is it hanging?
This is example run. IP change happens after about 10 seconds.
[root@moj-centosik examples]# time ./amqp_producer 10.102.44.44 5672 1 20
1001 ms: Sent 3 - 3 since last report (2 Hz)
2000 ms: Sent 4 - 1 since last report (1 Hz)
3000 ms: Sent 5 - 1 since last report (1 Hz)
4001 ms: Sent 6 - 1 since last report (0 Hz)
5000 ms: Sent 7 - 1 since last report (1 Hz)
6000 ms: Sent 8 - 1 since last report (1 Hz)
7001 ms: Sent 9 - 1 since last report (0 Hz)
8000 ms: Sent 10 - 1 since last report (1 Hz)
9001 ms: Sent 11 - 1 since last report (0 Hz)
10001 ms: Sent 12 - 1 since last report (1 Hz)
11000 ms: Sent 13 - 1 since last report (1 Hz)
12000 ms: Sent 14 - 1 since last report (1 Hz)
13001 ms: Sent 15 - 1 since last report (0 Hz)
14000 ms: Sent 16 - 1 since last report (1 Hz)
15000 ms: Sent 17 - 1 since last report (1 Hz)
16001 ms: Sent 18 - 1 since last report (0 Hz)
17001 ms: Sent 19 - 1 since last report (1 Hz)
18000 ms: Sent 20 - 1 since last report (1 Hz)
PRODUCER - Message count: 20
Total time, milliseconds: 19001
Overall messages-per-second: 1.05252
Closing channel: Connection timed out
real 17m32.644s
user 0m0.053s
sys 0m0.023s
Without IP change it ends correctly after 19 seconds.
Kernel 2.6.32. Architecture x86_64. Distribution: CentOS 6.2
My short reply: the library is behaving correctly, the socket should eventually timeout when the client IP changes
Slightly longer reply: there's a usability problem if you have to wait 17 minutes for it to timeout, though I'm unsure what the best course of action is, as setting a timeout can be problematic on a slow connection, usually its best to let the OS figure things out then notify you when things change.
Also changing IP address while connected should be a somewhat rare event (though it does happen say you're on a laptop and you move to a different WiFi base station - your IP may change).
Quick brain-dump based on the research I've done:
When you change the IP address of either the local or remote machine, existing connections should become invalid. This should result in an error at the socket level (when doing a send() or recv()), which does happen, it just takes 17 minutes for things to timeout.
The reason that the error doesn't happen at about 10 messages in, is because by default the tcp send buffer for linux is 128KB, and we're sending 256B messages (plus AMQP overhead, which isn't huge, maybe 1k in this case), so we never fill that buffer (at least in the above case), so it just sits in the buffer, until it fails.
I had the same issue with iOS switching between wifi networks, or between a wifi network and cellular connection.
Basically I just check to make sure that the network adapter is valid(i.e. pull the IP if possible and check it every couple seconds in a loop), if it changes, pause whatever your app is doing, tear down the rabbitMQ-c client, and reinitialize it to reconnect with the new IP.
@alanxz so, how to fix this? I could try to make a fix, but where this timeout happens in a library?
btw, IP change is not a rare event. Often, there is a load balancer (e.g. keepalived) before the Rabbit. So, this balancer can switch IP-addresses whithout notification of client
If I start rabbitmq-c client, connect successfully and change IP address of the machine, rabbitmq-c will hang before exiting.
Easiest way to reproduce:
Let's assume:
Start example:
Then before it exits, change IP of rabbitmq-c to 192.168.2.4.
If you change IP, while producer is running, example will hang on line:
Without IP change, it exits immediatelly. So I think it's a bug.