amqp / rhea

A reactive messaging library based on the AMQP protocol
Apache License 2.0
280 stars 79 forks source link

Questions: Connection recovery/reconnect #92

Open amarzavery opened 6 years ago

amarzavery commented 6 years ago
  1. Is it true that a connection_close event will always follow by disconnected event? If yes, then why do we need two events? What is the best practice for a consumer of rhea? Should the consumer be listening to both connection_close and disconnected? What is it that you would expect a consumer to differently in both the handlers?
  2. Due to an idle timeout enforced by Azure EventHubs (not related to the heartbeat/keepAlive concept of AMQP) when a link is inactive (no pending receive or a message being sent), or when the connection does not have any sender/receiver, the connection is forcibly disconnected. Since the reconnect limit to 100; rhea successfully establishes the connection. However, it does not reconnect the sender/receiver links that were present on the connection. Is this the expected behavior?
grs commented 6 years ago
  1. The connection_close is raised when the peer sends a close frame. The disconnected event is raised when the peer closes the socket. I could change it so as not to emit a disconnected if the connection has already been closed. While at present you will always get a disconnected event after a connection_close (unless the peer does not close the socket), you may not always get a connection_close before a disconnected event. I would use the disconnected event to reset any state related to indoubt messages.

  2. If the connection is closed with an error considered non-fatal (i.e. listed in the non_fata_errors option), then I would expect any links that were not closed prior to the connection_close to be re-established.

amarzavery commented 6 years ago

A sample log of events that happen during idle timeout:


1.   rhea:frames [connection-1] RECV: '{"size":268,"type":0,"channel":0,"performative":{"type":"detach#16","closed":true,"error":{"type":"error#1d","condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:2fe3aa79d13842e9857a2b4687f19e74_G10, SystemTracker:gateway5, Timestamp:6/27/2018 3:35:24 AM"}}}' +1m
2.   rhea:events Link got event: sender_error +5m
3.   azure:event-hubs:cbs An error occurred on the cbs sender link.. { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:2fe3aa79d13842e9857a2b4687f19e74_G10, SystemTracker:gateway5, Timestamp:6/27/2018 3:35:24 AM
4.   azure:event-hubs:cbs     at Object.translate (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/azure-event-hubs/lib/errors.ts:357:19)
5.   azure:event-hubs:cbs     at Sender._cbsSenderReceiverLink.sender.on (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/azure-event-hubs/lib/cbs.ts:53:25)
6.   azure:event-hubs:cbs     at emitOne (events.js:116:13)
7.   azure:event-hubs:cbs     at Sender.emit (events.js:211:7)
8.   azure:event-hubs:cbs     at Sender.link.dispatch (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/link.js:59:37)
9.   azure:event-hubs:cbs     at Sender.link.on_detach (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/link.js:143:32)
10.   azure:event-hubs:cbs     at Session.on_detach (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/session.js:662:27)
11.   azure:event-hubs:cbs     at Connection.(anonymous function) [as on_detach] (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/connection.js:656:30)
12.   azure:event-hubs:cbs     at c.dispatch (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/types.js:904:33)
13.   azure:event-hubs:cbs     at Transport.read (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/transport.js:95:36)
14.   azure:event-hubs:cbs   name: 'ConnectionForcedError',
15.   azure:event-hubs:cbs   translated: true,
16.   azure:event-hubs:cbs   retryable: false,
17.   azure:event-hubs:cbs   condition: 'amqp:connection:forced' } +5m
18.   rhea:events Link got event: sender_close +21ms
19.   rhea:events Session got event: sender_close +1ms
20.   rhea:events Connection got event: sender_close +0ms
21.   rhea:events Container got event: sender_close +0ms
22.   rhea:io [connection-1] read 571 bytes +76ms
23.   rhea:io [connection-1] got frame of size 275 +1ms
24.   rhea:frames [connection-1] RECV: '{"size":275,"type":0,"channel":0,"performative":{"type":"detach#16","handle":1,"closed":true,"error":{"type":"error#1d","condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:2fe3aa79d13842e9857a2b4687f19e74_G10, SystemTracker:gateway5, Timestamp:6/27/2018 3:35:24 AM"}}}' +77ms
25.   rhea:events Link got event: receiver_error +54ms
26.   azure:event-hubs:cbs An error occurred on the cbs receiver link.. { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:2fe3aa79d13842e9857a2b4687f19e74_G10, SystemTracker:gateway5, Timestamp:6/27/2018 3:35:24 AM
27.   azure:event-hubs:cbs     at Object.translate (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/azure-event-hubs/lib/errors.ts:357:19)
28.   azure:event-hubs:cbs     at Receiver._cbsSenderReceiverLink.receiver.on (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/azure-event-hubs/lib/cbs.ts:57:25)
29.   azure:event-hubs:cbs     at emitOne (events.js:116:13)
30.   azure:event-hubs:cbs     at Receiver.emit (events.js:211:7)
31.   azure:event-hubs:cbs     at Receiver.link.dispatch (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/link.js:59:37)
32.   azure:event-hubs:cbs     at Receiver.link.on_detach (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/link.js:143:32)
33.   azure:event-hubs:cbs     at Session.on_detach (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/session.js:662:27)
34.   azure:event-hubs:cbs     at Connection.(anonymous function) [as on_detach] (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/connection.js:656:30)
35.   azure:event-hubs:cbs     at c.dispatch (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/types.js:904:33)
36.   azure:event-hubs:cbs     at Transport.read (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/examples/node_modules/rhea/lib/transport.js:95:36)
37.   azure:event-hubs:cbs   name: 'ConnectionForcedError',
38.   azure:event-hubs:cbs   translated: true,
39.   azure:event-hubs:cbs   retryable: false,
40.   azure:event-hubs:cbs   condition: 'amqp:connection:forced' } +72ms
41.   rhea:events Link got event: receiver_close +33ms
42.   rhea:events Session got event: receiver_close +0ms
43.   rhea:events Connection got event: receiver_close +1ms
44.   rhea:events Container got event: receiver_close +0ms
45.   rhea:io [connection-1] got frame of size 15 +34ms
46.   rhea:frames [connection-1] RECV: '{"size":15,"type":0,"channel":0,"performative":{"type":"end#17"}}' +34ms
47.   rhea:events Session got event: session_close +0ms
48.   rhea:events Connection got event: session_close +0ms
49.   rhea:events Container got event: session_close +0ms
50.   rhea:io [connection-1] got frame of size 15 +0ms
51.   rhea:frames [connection-1] RECV: '{"size":15,"type":0,"channel":1,"performative":{"type":"end#17"}}' +0ms
52.   rhea:events Session got event: session_close +1ms
53.   rhea:events Connection got event: session_close +0ms
54.   rhea:events Container got event: session_close +0ms
55.   rhea:io [connection-1] got frame of size 266 +1ms
56.   rhea:frames [connection-1] RECV: '{"size":266,"type":0,"channel":0,"performative":{"type":"close#18","error":{"type":"error#1d","condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:2fe3aa79d13842e9857a2b4687f19e74_G10, SystemTracker:gateway5, Timestamp:6/27/2018 3:35:24 AM"}}}' +1ms
57.   rhea:events Connection got event: connection_error +0ms
58.   rhea:events Container got event: connection_error +0ms
59.   rhea:events Connection got event: connection_close +1ms
60.   rhea:events Container got event: connection_close +0ms
61.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +1ms
62.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 e4 02 00 00 00 00 53 10 d0 00 00 00 d4 00 00 00 0a a1 24 32 33 62 35 32 31 37 64 2d 34 63 38 63 2d 36 35 34 36 2d 62 64 34 39 2d 62 38 64 64 ... > +1m
63.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":[]}' +0ms
64.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 17 45> +0ms
65.   rhea:frames [connection-1] PENDING: '{"channel":1,"type":0,"performative":[]}' +1ms
66.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 01 00 53 17 45> +1ms
67.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":[]}' +0ms
68.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 18 45> +0ms
69.   rhea:events Connection got event: disconnected +216ms
70.   rhea:events Container got event: disconnected +0ms
71.   [connection-1] disconnected 
72.   rhea:reconnect Scheduled reconnect in 100ms +0ms
73.   rhea:reconnect reconnecting... +104ms
74.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +321ms
75.   rhea:io [connection-1] connected 10.104.140.242:64831 -> 52.226.36.235:5671 +576ms
76.   rhea:io [connection-1] read 8 bytes +78ms
77.   rhea:frames [connection-1] RECV: { protocol_id: 3, major: 1, minor: 0, revision: 0 } +331ms
78.   rhea:io [connection-1] read 63 bytes +78ms
79.   rhea:io [connection-1] got frame of size 63 +0ms
80.   rhea:frames [connection-1] RECV: '{"size":63,"type":1,"performative":{"type":"sasl_mechanisms#40","sasl_server_mechanisms":["MSSBCBS","PLAIN","ANONYMOUS","EXTERNAL"]}}' +78ms
81.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":1,"performative":["ANONYMOUS",{"type":"Buffer","data":[0,82,111,111,116,77,97,110,97,103,101,83,104,97,114,101,100,65,99,99,101,115,115,75,101,121]},"eastusjs.servicebus.windows.net"]}' +0ms
82.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 5c 02 01 00 00 00 53 41 d0 00 00 00 4c 00 00 00 03 a3 09 41 4e 4f 4e 59 4d 4f 55 53 a0 1a 00 52 6f 6f 74 4d 61 6e 61 67 65 53 68 61 72 65 64 ... > +730ms
83.   rhea:io [connection-1] read 26 bytes +78ms
84.   rhea:io [connection-1] got frame of size 26 +0ms
85.   rhea:frames [connection-1] RECV: '{"size":26,"type":1,"performative":{"type":"sasl_outcome#44","additional_data":{"type":"Buffer","data":[87,101,108,99,111,109,101,33]}}}' +78ms
86.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 e4 02 00 00 00 00 53 10 d0 00 00 00 d4 00 00 00 0a a1 24 32 33 62 35 32 31 37 64 2d 34 63 38 63 2d 36 35 34 36 2d 62 64 34 39 2d 62 38 64 64 ... > +78ms
87.   rhea:io [connection-1] read 8 bytes +78ms
88.   rhea:frames [connection-1] RECV: { protocol_id: 0, major: 1, minor: 0, revision: 0 } +78ms
89.   rhea:io [connection-1] read 71 bytes +78ms
90.   rhea:io [connection-1] got frame of size 71 +0ms
91.   rhea:frames [connection-1] RECV: '{"size":71,"type":0,"channel":0,"performative":{"type":"open#10","container_id":"cb8e741287ec47d9b04f27cebb003804_G15","max_frame_size":65536,"channel_max":4999,"idle_time_out":240000}}' +78ms
92.   rhea:events Connection got event: connection_open +749ms
93.   rhea:events Container got event: connection_open +0ms
94.   rhea:io [connection-1] read 265 bytes +60s
95.   rhea:io [connection-1] got frame of size 265 +0ms
96.   rhea:frames [connection-1] RECV: '{"size":265,"type":0,"channel":0,"performative":{"type":"close#18","error":{"type":"error#1d","condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 60000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:cb8e741287ec47d9b04f27cebb003804_G15, SystemTracker:gateway5, Timestamp:6/27/2018 3:36:25 AM"}}}' +60s
97.   rhea:events Connection got event: connection_error +60s
98.   rhea:events Container got event: connection_error +0ms
99.   rhea:events Connection got event: connection_close +0ms
100.   rhea:events Container got event: connection_close +0ms
101.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +0ms
102.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 e4 02 00 00 00 00 53 10 d0 00 00 00 d4 00 00 00 0a a1 24 32 33 62 35 32 31 37 64 2d 34 63 38 63 2d 36 35 34 36 2d 62 64 34 39 2d 62 38 64 64 ... > +1m
103.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":[]}' +29ms
104.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 18 45> +1ms
105.   rhea:events Connection got event: disconnected +427ms
106.   rhea:events Container got event: disconnected +0ms
107.   [connection-1] disconnected 
108.   rhea:reconnect Scheduled reconnect in 100ms +1m
109.   rhea:reconnect reconnecting... +102ms

Questions:

  1. What is rhea trying to send the connection_close and the disconnected events are received?
  2. Since amqp:connection:forced error received for idle timeout is a non fatal error, I will have to handle reconnects myself. Is there a way to let rhea know that amqp:connection:forced is not a non-fatal-error, inorder for rhea to handle reconnects?
    1. It would be nice if rhea can provide options (a function that can be used for link and underlying session recovery). There are instances in EventHub, where the receiver link is disconnected for some reason then we would like to re-establish the link by specifying the offset of the last received message as a filter in the link options. This helps us avoid receiving messages from the beginning or the previous offset. It would be nice if there is some kind of a callback or a function that can be provided for reconnect scenarios.
grs commented 6 years ago

I don't understand the first question. For the second you can set the non_fatal_errors option (on connection or container) to be an empty list (or any other list), in order to not treat amqp:connection:forced as non_fatal. However, just to clarify, the library will try to reconnect if the error is considered non-fatal. It will not reconnect if the error is considered fatal (meaning reconnecting will not help).

A further point, based on your log, is that it appears the sender and receiver are being closed by the peer with the somewhat unintuitive error of amqp:connection:forced. I suspect that is why they are not re-established. There is as yet no equivalent logic for determining whether to re-establish links that are closed by the peer with an error. You can of course handle those errors and recreate the link yourself.

grs commented 6 years ago

One way to reset the filter would be to do so in the disconnected event (using the set_source() method). Then when the link is re-established it should (hopefully) be at the right offset.

amarzavery commented 6 years ago

For the first question, I am talking about the following sequence

57.   rhea:events Connection got event: connection_error +0ms
58.   rhea:events Container got event: connection_error +0ms
59.   rhea:events Connection got event: connection_close +1ms
60.   rhea:events Container got event: connection_close +0ms
61.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +1ms
62.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 e4 02 00 00 00 00 53 10 d0 00 00 00 d4 00 00 00 0a a1 24 32 33 62 35 32 31 37 64 2d 34 63 38 63 2d 36 35 34 36 2d 62 64 34 39 2d 62 38 64 64 ... > +1m
63.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":[]}' +0ms
64.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 17 45> +0ms
65.   rhea:frames [connection-1] PENDING: '{"channel":1,"type":0,"performative":[]}' +1ms
66.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 01 00 53 17 45> +1ms
67.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":[]}' +0ms
68.   rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 18 45> +0ms
69.   rhea:events Connection got event: disconnected +216ms
70.   rhea:events Container got event: disconnected +0ms
71.   [connection-1] disconnected 
72.   rhea:reconnect Scheduled reconnect in 100ms +0ms
73.   rhea:reconnect reconnecting... +104ms
74.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +321ms
75.   rhea:io [connection-1] connected 10.104.140.242:64831 -> 52.226.36.235:5671 +576ms
76.   rhea:io [connection-1] read 8 bytes +78ms

As you can see, after connection_close on line 60 and before disconnected on line 71. Basically, from line 61-70, it seems that rhea is trying to send something to the service. I am trying to understand a) what is it sending? b) why is it sending that info?

amarzavery commented 6 years ago

It seems that rhea is trying to connect even before the disconnected event is received.

If you compare the logs on line 61

61.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +1ms

with line 74 which happens after the 100ms wait for reconnection

74.   rhea:frames [connection-1] PENDING: '{"channel":0,"type":0,"performative":["23b5217d-4c8c-6546-bd49-b8dd38cde763","eastusjs.servicebus.windows.net",null,null,null,null,null,null,null,["product","MSJSClient","version","0.2.0","platform","(x64-Darwin-17.6.0)","framework","Node/v8.11.2","user-agent","/js-event-hubs"]]}' +321ms 

They are both same. That makes me think that a frame for connection was sent before the disconnected event was received. That shouldn't happen right? Am I being clear about my doubt?

grs commented 6 years ago

The logging here is not helpful, I agree. Lines 64 and 66 are session ends, line 68 is a connection close. Line 61 is as you say a connection open. That is a little odd, I would not have expected it and will look into why it is sent. However it should not affect the reconnect as that will take place over a separate socket.

grs commented 6 years ago

I've raised an issue and fixed it for the spurious open being sent before disconnection when the connection is closed by the server with an error considered non-fatal by the client: https://github.com/amqp/rhea/issues/94

amarzavery commented 6 years ago

@grs - It looks like the disconnected event is not emitted anymore after connection_close. Something that used to happen before.

Here are the logs for the same example that was run before

rhea:frames [connection-1]:0 <- detach#16 {"closed":true,"error":{"condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:cb08c124f5c046169b4aae2425e1af6c_G30, SystemTracker:gateway5, Timestamp:8/8/2018 4:59:59 PM"}}  +1m
  rhea:events Link got event: sender_error +5m
  azure:amqp-common:cbs An error occurred on the cbs sender link.. { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:cb08c124f5c046169b4aae2425e1af6c_G30, SystemTracker:gateway5, Timestamp:8/8/2018 4:59:59 PM
  azure:amqp-common:cbs     at Object.translate (/Users/amarz/sdk/rh/app/node_modules/azure-event-hubs/dist/lib/amqp-common/errors.js:398:23)
  azure:amqp-common:cbs     at Sender._cbsSenderReceiverLink.sender.registerHandler (/Users/amarz/sdk/rh/app/node_modules/azure-event-hubs/dist/lib/amqp-common/cbs.js:56:46)
  azure:amqp-common:cbs     at emitOne (events.js:116:13)
  azure:amqp-common:cbs     at Sender.emit (events.js:211:7)
  azure:amqp-common:cbs     at Sender.link.dispatch (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/link.js:59:37)
  azure:amqp-common:cbs     at Sender.link.on_detach (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/link.js:156:32)
  azure:amqp-common:cbs     at Session.on_detach (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/session.js:695:27)
  azure:amqp-common:cbs     at Connection.(anonymous function) [as on_detach] (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/connection.js:680:30)
  azure:amqp-common:cbs     at c.dispatch (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/types.js:909:33)
  azure:amqp-common:cbs     at Transport.read (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/transport.js:108:36)
  azure:amqp-common:cbs   name: 'ConnectionForcedError',
  azure:amqp-common:cbs   translated: true,
  azure:amqp-common:cbs   retryable: true,
  azure:amqp-common:cbs   info: null,
  azure:amqp-common:cbs   condition: 'amqp:connection:forced' } +5m
  rhea:events Link got event: sender_close +1ms
  rhea:events Session got event: sender_close +0ms
  rhea:events Connection got event: sender_close +0ms
  rhea:events Container got event: sender_close +0ms
  rhea:io [connection-1] read 548 bytes +714ms
  rhea:io [connection-1] got frame of size 268 +0ms
  rhea:frames [connection-1]:0 <- detach#16 {"handle":1,"closed":true,"error":{"condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:cb08c124f5c046169b4aae2425e1af6c_G30, SystemTracker:gateway5, Timestamp:8/8/2018 4:59:59 PM"}}  +714ms
  rhea:events Link got event: receiver_error +713ms
  azure:amqp-common:cbs An error occurred on the cbs receiver link.. { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:cb08c124f5c046169b4aae2425e1af6c_G30, SystemTracker:gateway5, Timestamp:8/8/2018 4:59:59 PM
  azure:amqp-common:cbs     at Object.translate (/Users/amarz/sdk/rh/app/node_modules/azure-event-hubs/dist/lib/amqp-common/errors.js:398:23)
  azure:amqp-common:cbs     at Receiver._cbsSenderReceiverLink.receiver.registerHandler (/Users/amarz/sdk/rh/app/node_modules/azure-event-hubs/dist/lib/amqp-common/cbs.js:60:46)
  azure:amqp-common:cbs     at emitOne (events.js:116:13)
  azure:amqp-common:cbs     at Receiver.emit (events.js:211:7)
  azure:amqp-common:cbs     at Receiver.link.dispatch (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/link.js:59:37)
  azure:amqp-common:cbs     at Receiver.link.on_detach (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/link.js:156:32)
  azure:amqp-common:cbs     at Session.on_detach (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/session.js:695:27)
  azure:amqp-common:cbs     at Connection.(anonymous function) [as on_detach] (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/connection.js:680:30)
  azure:amqp-common:cbs     at c.dispatch (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/types.js:909:33)
  azure:amqp-common:cbs     at Transport.read (/Users/amarz/sdk/rh/app/node_modules/rhea/lib/transport.js:108:36)
  azure:amqp-common:cbs   name: 'ConnectionForcedError',
  azure:amqp-common:cbs   translated: true,
  azure:amqp-common:cbs   retryable: true,
  azure:amqp-common:cbs   info: null,
  azure:amqp-common:cbs   condition: 'amqp:connection:forced' } +715ms
  rhea:events Link got event: receiver_close +1ms
  rhea:events Session got event: receiver_close +0ms
  rhea:events Connection got event: receiver_close +0ms
  rhea:events Container got event: receiver_close +0ms
  rhea:io [connection-1] got frame of size 15 +1ms
  rhea:frames [connection-1]:0 <- end#17 {}  +1ms
  rhea:events Session got event: session_close +0ms
  rhea:events Connection got event: session_close +0ms
  rhea:events Container got event: session_close +0ms
  rhea:io [connection-1] got frame of size 265 +0ms
  rhea:frames [connection-1]:0 <- close#18 {"error":{"condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:cb08c124f5c046169b4aae2425e1af6c_G30, SystemTracker:gateway5, Timestamp:8/8/2018 4:59:59 PM"}}  +1ms
  rhea:events Connection got event: connection_error +1ms
  rhea:events Container got event: connection_error +0ms
  rhea:events Connection got event: connection_close +0ms
  rhea:events Container got event: connection_close +0ms
  rhea:frames [connection-1]:0 -> end#17 {}  +0ms
  rhea:frames [connection-1]:0 -> close#18 {}  +0ms
amarzavery commented 6 years ago

Previously, at line 69-70 there was a disconnected event that was emitted after connection_close. This time that did not happen.

grs commented 6 years ago

I've commited a fix in https://github.com/amqp/rhea/commit/d62f131f26942b396f2aba4664c97432e683c823

amarzavery commented 6 years ago

@grs - Now I do get the disconnected event. However, when I attempt to reconnect, I get the same connection-forced error back with the same tracking Id that was received last time. It looks like the remote peer still thinks that it is the old connection. Btw, this (manual reconnect after receiving the disconnected event) used to work fine previously.

Logs for reference:


-------> Initial error <--------------
 rhea:raw [connection-1] RECV: 266 0000010a02000000005318c0fd0100531dc0f703a316616d71703a636f6e6e656374696f6e3a666f72636564a1db54686520636f6e6e656374696f6e2077617320696e61637469766520666f72206d6f7265207468616e2074686520616c6c6f77656420333030303030206d696c6c697365636f6e647320616e6420697320636c6f73656420627920636f6e7461696e657220274c696e6b547261636b6572272e20547261636b696e6749643a62323165383466396264363534393661393939386138353533633630653763615f4731332c2053797374656d547261636b65723a67617465776179352c2054696d657374616d703a382f31302f3230313820383a35303a303620504d40 +0ms
  rhea:frames [connection-1]:0 <- close#18 {"error":{"condition":"amqp:connection:forced","description":"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM"}}  +0ms
  rhea:events [connection-1] Connection got event: connection_error +1ms
  rhea:events [082aebdf-b22c-0843-b703-f187b42d3cf7] Container got event: connection_error +0ms

rhea:events [connection-1] Connection got event: connection_close +0ms
  azure:event-hubs:connectionContext [connection-1] connection close happened. +5m
  rhea:frames [connection-1]:0 -> detach#16 {"handle":1,"closed":true}  +2ms
  rhea:raw [connection-1] SENT: 23 0000001702000000005316d00000000700000002520141 +2ms
  rhea:frames [connection-1]:0 -> end#17 {}  +0ms
  rhea:raw [connection-1] SENT: 12 0000000c0200000000531745 +0ms
  rhea:frames [connection-1]:0 -> close#18 {}  +0ms
  rhea:raw [connection-1] SENT: 12 0000000c0200000000531845 +0ms

--------------> disconnected event <----------------------------
  rhea:events [connection-1] Connection got event: disconnected +330ms
  azure:event-hubs:error [connection-1] Error (context.connection.error) occurred on the amqp connection: c {
  azure:event-hubs:error   value: 
  azure:event-hubs:error    [ Typed { type: [Object], value: 'amqp:connection:forced' },
  azure:event-hubs:error      Typed {
  azure:event-hubs:error        type: [Object],
  azure:event-hubs:error        value: 'The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM' },
  azure:event-hubs:error      Typed { type: [Object], value: null } ] } +5m
  azure:event-hubs:error [connection-1] Error (context.error) occurred on the amqp connection: { Error: read ECONNRESET
  azure:event-hubs:error     at _errnoException (util.js:992:11)
  azure:event-hubs:error     at TLSWrap.onread (net.js:618:25) code: 'ECONNRESET', errno: 'ECONNRESET', syscall: 'read' } +0ms

....still doing something after 5 minutes. We should get here even if the connection goes down and try to reconnect....

  azure:event-hubs:error [connection-1] Receiver '7b02fa30-4c01-494f-add1-be3703501150' with address 'test1/ConsumerGroups/$default/Partitions/0' is open? -> undefined +8s
  azure:event-hubs:receiverbatching [connection-1] Receiver '7b02fa30-4c01-494f-add1-be3703501150', setting the prefetch count to 0. +5m
  azure:event-hubs:error [connection-1] Receiver '7b02fa30-4c01-494f-add1-be3703501150' with address 'test1/ConsumerGroups/$default/Partitions/0' is open? -> undefined +1ms
  azure:event-hubs:error [connection-1] The receiver '7b02fa30-4c01-494f-add1-be3703501150' with address 'test1/ConsumerGroups/$default/Partitions/0' is not open and is not currently establishing itself. Hence let's try to connect. +0ms
  azure:event-hubs:linkEntity [connection-1] Acquiring cbs lock: 'negotiateCbs-dfd05c53-a7dc-474c-a4db-0a21696f8bc7' for creating the cbs session while creating the BatchingReceiver: '7b02fa30-4c01-494f-add1-be3703501150' with address: 'test1/ConsumerGroups/$default/Partitions/0'. +5m

----------------------> Reconnect attempt <-----------------------

  azure:amqp-common:cbs The CBS client is trying to establish an AMQP connection. +5m
  rhea:io [connection-1] connected 10.104.141.137:57377 -> 52.226.36.235:5671 +10s
  rhea:frames [connection-1] -> { protocol_id: 3, major: 1, minor: 0, revision: 0 } +10s
  rhea:io [connection-1] read 8 bytes +271ms
  rhea:frames [connection-1] <- { protocol_id: 3, major: 1, minor: 0, revision: 0 } +271ms
  rhea:io [connection-1] read 63 bytes +77ms
  rhea:io [connection-1] got frame of size 63 +0ms
  rhea:raw [connection-1] RECV: 63 0000003f02010000005340c03201e02f04b3000000074d53534243425300000005504c41494e00000009414e4f4e594d4f55530000000845585445524e414c +10s
  rhea:frames [connection-1]:undefined <- sasl_mechanisms#40 {"sasl_server_mechanisms":["MSSBCBS","PLAIN","ANONYMOUS","EXTERNAL"]}  +78ms
  rhea:frames [connection-1]:0 -> function Typed(type, value, code, descriptor) {
    this.type = type;
    this.value = value;
    if (code) {
        this.array_constructor = {'typecode':code};
        if (descriptor) {
            this.array_constructor.descriptor = descriptor;
        }
    }
} ["ANONYMOUS",{"type":"Buffer","data":[0,82,111,111,116,77,97,110,97,103,101,83,104,97,114,101,100,65,99,99,101,115,115,75,101,121]},"eastusjs.servicebus.windows.net"]  +0ms
  rhea:raw [connection-1] SENT: 92 0000005c02010000005341d00000004c00000003a309414e4f4e594d4f5553a01a00526f6f744d616e6167655368617265644163636573734b6579a11f6561737475736a732e736572766963656275732e77696e646f77732e6e6574 +0ms
  rhea:io [connection-1] read 26 bytes +79ms
  rhea:io [connection-1] got frame of size 26 +0ms
  rhea:raw [connection-1] RECV: 26 0000001a02010000005344c00d025000a00857656c636f6d6521 +78ms
  rhea:frames [connection-1]:undefined <- sasl_outcome#44 {"additional_data":{"type":"Buffer","data":[87,101,108,99,111,109,101,33]}}  +78ms
  rhea:frames [connection-1]:0 -> open#10 {"transport":"tls","host":"eastusjs.servicebus.windows.net","username":"RootManageSharedAccessKey","port":5671,"servername":"eastusjs.servicebus.windows.net","id":"connection-1","container_id":"082aebdf-b22c-0843-b703-f187b42d3cf7","hostname":"eastusjs.servicebus.windows.net","properties":{"product":"MSJSClient","version":"0.2.6","platform":"(x64-Darwin-17.7.0)","framework":"Node/v8.11.2","user-agent":"/js-event-hubs"}}  +0ms
  rhea:raw [connection-1] SENT: 228 000000e402000000005310d0000000d40000000aa12430383261656264662d623232632d303834332d623730332d663138376234326433636637a11f6561737475736a732e736572766963656275732e77696e646f77732e6e657440404040404040d10000007d0000000aa30770726f64756374a10a4d534a53436c69656e74a30776657273696f6ea105302e322e36a308706c6174666f726da113287836342d44617277696e2d31372e372e3029a3096672616d65776f726ba10c4e6f64652f76382e31312e32a30a757365722d6167656e74a10e2f6a732d6576656e742d68756273 +1ms

---------> Disconnected event with same error and tracking id from before <-------

  rhea:events [connection-1] Connection got event: disconnected +10s
  azure:event-hubs:error [connection-1] Error (context.connection.error) occurred on the amqp connection: c {
  azure:event-hubs:error   value: 
  azure:event-hubs:error    [ Typed { type: [Object], value: 'amqp:connection:forced' },
  azure:event-hubs:error      Typed {
  azure:event-hubs:error        type: [Object],
  azure:event-hubs:error        value: 'The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM' },
  azure:event-hubs:error      Typed { type: [Object], value: null } ] } +3s
  azure:event-hubs:error [connection-1] connection.close() was not called from the sdk and there were some sender or receiver links or both. We should reconnect. +0ms
  rhea-promise:error [connection-1] Error occurred while establishing amqp connection: c {
  rhea-promise:error   value: 
  rhea-promise:error    [ Typed { type: [Object], value: 'amqp:connection:forced' },
  rhea-promise:error      Typed {
  rhea-promise:error        type: [Object],
  rhea-promise:error        value: 'The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM' },
  rhea-promise:error      Typed { type: [Object], value: null } ] } +5m
  azure:amqp-common:error [connection-1] An error occured while establishing the cbs links: { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM
  azure:amqp-common:error     at Object.translate (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/lib/amqp-common/errors.ts:454:19)
  azure:amqp-common:error     at CbsClient.init (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/lib/amqp-common/cbs.ts:98:13)
  azure:amqp-common:error     at <anonymous>
  azure:amqp-common:error     at process._tickDomainCallback (internal/process/next_tick.js:228:7)
  azure:amqp-common:error   name: 'ConnectionForcedError',
  azure:amqp-common:error   translated: true,
  azure:amqp-common:error   retryable: true,
  azure:amqp-common:error   info: null,
  azure:amqp-common:error   condition: 'amqp:connection:forced' } +11s

What could be going wrong over here?

grs commented 6 years ago

How are you initiating reconnect?

amarzavery commented 6 years ago

In my sdk I preserve the reference to rhea's connection object because (I initially call const connection = rhea.create_connection(options). Now when the connection gets disconnected, I simply call connection.connect() again to reconnect. You can see in the logs that the connection-id is still [Connection-1] when I am trying to reconnect.

To compare the logs of

initial connection from the start

 azure:event-hubs:linkEntity [connection-1] Acquiring cbs lock: 'negotiateCbs-dfd05c53-a7dc-474c-a4db-0a21696f8bc7' for creating the cbs session while creating the BatchingReceiver: '0e365aa8-170a-40c5-8f8e-0ecd708cafc7' with address: 'test1/ConsumerGroups/$default/Partitions/0'. +0ms
  azure:amqp-common:cbs The CBS client is trying to establish an AMQP connection. +0ms
  rhea:io [connection-1] connected 10.104.141.137:57342 -> 52.226.36.235:5671 +0ms
  rhea:frames [connection-1] -> { protocol_id: 3, major: 1, minor: 0, revision: 0 } +0ms
  rhea:io [connection-1] read 8 bytes +81ms
  rhea:frames [connection-1] <- { protocol_id: 3, major: 1, minor: 0, revision: 0 } +80ms
  rhea:io [connection-1] read 63 bytes +75ms
  rhea:io [connection-1] got frame of size 63 +0ms
  rhea:raw [connection-1] RECV: 63 0000003f02010000005340c03201e02f04b3000000074d53534243425300000005504c41494e00000009414e4f4e594d4f55530000000845585445524e414c +0ms
  rhea:frames [connection-1]:undefined <- sasl_mechanisms#40 {"sasl_server_mechanisms":["MSSBCBS","PLAIN","ANONYMOUS","EXTERNAL"]}  +77ms
  rhea:frames [connection-1]:0 -> function Typed(type, value, code, descriptor) {
    this.type = type;
    this.value = value;
    if (code) {
        this.array_constructor = {'typecode':code};
        if (descriptor) {
            this.array_constructor.descriptor = descriptor;
        }
    }
} ["ANONYMOUS",{"type":"Buffer","data":[0,82,111,111,116,77,97,110,97,103,101,83,104,97,114,101,100,65,99,99,101,115,115,75,101,121]},"eastusjs.servicebus.windows.net"]  +2ms
  rhea:raw [connection-1] SENT: 92 0000005c02010000005341d00000004c00000003a309414e4f4e594d4f5553a01a00526f6f744d616e6167655368617265644163636573734b6579a11f6561737475736a732e736572766963656275732e77696e646f77732e6e6574 +4ms
  rhea:io [connection-1] read 26 bytes +82ms
  rhea:io [connection-1] got frame of size 26 +0ms
  rhea:raw [connection-1] RECV: 26 0000001a02010000005344c00d025000a00857656c636f6d6521 +78ms
  rhea:frames [connection-1]:undefined <- sasl_outcome#44 {"additional_data":{"type":"Buffer","data":[87,101,108,99,111,109,101,33]}}  +78ms
  rhea:frames [connection-1] -> { protocol_id: 0, major: 1, minor: 0, revision: 0 } +1ms
  rhea:frames [connection-1]:0 -> open#10 {"transport":"tls","host":"eastusjs.servicebus.windows.net","username":"RootManageSharedAccessKey","port":5671,"servername":"eastusjs.servicebus.windows.net","id":"connection-1","container_id":"082aebdf-b22c-0843-b703-f187b42d3cf7","hostname":"eastusjs.servicebus.windows.net","properties":{"product":"MSJSClient","version":"0.2.6","platform":"(x64-Darwin-17.7.0)","framework":"Node/v8.11.2","user-agent":"/js-event-hubs"}}  +1ms
  rhea:raw [connection-1] SENT: 228 000000e402000000005310d0000000d40000000aa12430383261656264662d623232632d303834332d623730332d663138376234326433636637a11f6561737475736a732e736572766963656275732e77696e646f77732e6e657440404040404040d10000007d0000000aa30770726f64756374a10a4d534a53436c69656e74a30776657273696f6ea105302e322e36a308706c6174666f726da113287836342d44617277696e2d31372e372e3029a3096672616d65776f726ba10c4e6f64652f76382e31312e32a30a757365722d6167656e74a10e2f6a732d6576656e742d68756273 +2ms
  rhea:io [connection-1] read 8 bytes +78ms
  rhea:frames [connection-1] <- { protocol_id: 0, major: 1, minor: 0, revision: 0 } +76ms
  rhea:io [connection-1] read 71 bytes +76ms
  rhea:io [connection-1] got frame of size 71 +0ms
  rhea:raw [connection-1] RECV: 71 0000004702000000005310c03a0aa12462323165383466396264363534393661393939386138353533633630653763615f473133407000010000601387700003a9804040404040 +152ms
  rhea:frames [connection-1]:0 <- open#10 {"container_id":"b21e84f9bd65496a9998a8553c60e7ca_G13","max_frame_size":65536,"channel_max":4999,"idle_time_out":240000}  +77ms
  rhea:events [connection-1] Connection got event: connection_open +0ms

v/s the logs of reconnect attempt after getting disconnected where sending the #open10 frame for establishing the connection ends in the connection receiving disconnected event.

azure:event-hubs:linkEntity [connection-1] Acquiring cbs lock: 'negotiateCbs-dfd05c53-a7dc-474c-a4db-0a21696f8bc7' for creating the cbs session while creating the BatchingReceiver: '7b02fa30-4c01-494f-add1-be3703501150' with address: 'test1/ConsumerGroups/$default/Partitions/0'. +5m
  azure:amqp-common:cbs The CBS client is trying to establish an AMQP connection. +5m
  rhea:io [connection-1] connected 10.104.141.137:57377 -> 52.226.36.235:5671 +10s
  rhea:frames [connection-1] -> { protocol_id: 3, major: 1, minor: 0, revision: 0 } +10s
  rhea:io [connection-1] read 8 bytes +271ms
  rhea:frames [connection-1] <- { protocol_id: 3, major: 1, minor: 0, revision: 0 } +271ms
  rhea:io [connection-1] read 63 bytes +77ms
  rhea:io [connection-1] got frame of size 63 +0ms
  rhea:raw [connection-1] RECV: 63 0000003f02010000005340c03201e02f04b3000000074d53534243425300000005504c41494e00000009414e4f4e594d4f55530000000845585445524e414c +10s
  rhea:frames [connection-1]:undefined <- sasl_mechanisms#40 {"sasl_server_mechanisms":["MSSBCBS","PLAIN","ANONYMOUS","EXTERNAL"]}  +78ms
  rhea:frames [connection-1]:0 -> function Typed(type, value, code, descriptor) {
    this.type = type;
    this.value = value;
    if (code) {
        this.array_constructor = {'typecode':code};
        if (descriptor) {
            this.array_constructor.descriptor = descriptor;
        }
    }
} ["ANONYMOUS",{"type":"Buffer","data":[0,82,111,111,116,77,97,110,97,103,101,83,104,97,114,101,100,65,99,99,101,115,115,75,101,121]},"eastusjs.servicebus.windows.net"]  +0ms
  rhea:raw [connection-1] SENT: 92 0000005c02010000005341d00000004c00000003a309414e4f4e594d4f5553a01a00526f6f744d616e6167655368617265644163636573734b6579a11f6561737475736a732e736572766963656275732e77696e646f77732e6e6574 +0ms
  rhea:io [connection-1] read 26 bytes +79ms
  rhea:io [connection-1] got frame of size 26 +0ms
  rhea:raw [connection-1] RECV: 26 0000001a02010000005344c00d025000a00857656c636f6d6521 +78ms
  rhea:frames [connection-1]:undefined <- sasl_outcome#44 {"additional_data":{"type":"Buffer","data":[87,101,108,99,111,109,101,33]}}  +78ms
  rhea:frames [connection-1]:0 -> open#10 {"transport":"tls","host":"eastusjs.servicebus.windows.net","username":"RootManageSharedAccessKey","port":5671,"servername":"eastusjs.servicebus.windows.net","id":"connection-1","container_id":"082aebdf-b22c-0843-b703-f187b42d3cf7","hostname":"eastusjs.servicebus.windows.net","properties":{"product":"MSJSClient","version":"0.2.6","platform":"(x64-Darwin-17.7.0)","framework":"Node/v8.11.2","user-agent":"/js-event-hubs"}}  +0ms
  rhea:raw [connection-1] SENT: 228 000000e402000000005310d0000000d40000000aa12430383261656264662d623232632d303834332d623730332d663138376234326433636637a11f6561737475736a732e736572766963656275732e77696e646f77732e6e657440404040404040d10000007d0000000aa30770726f64756374a10a4d534a53436c69656e74a30776657273696f6ea105302e322e36a308706c6174666f726da113287836342d44617277696e2d31372e372e3029a3096672616d65776f726ba10c4e6f64652f76382e31312e32a30a757365722d6167656e74a10e2f6a732d6576656e742d68756273 +1ms
  rhea:events [connection-1] Connection got event: disconnected +10s
  azure:event-hubs:error [connection-1] Error (context.connection.error) occurred on the amqp connection: c {
  azure:event-hubs:error   value: 
  azure:event-hubs:error    [ Typed { type: [Object], value: 'amqp:connection:forced' },
  azure:event-hubs:error      Typed {
  azure:event-hubs:error        type: [Object],
  azure:event-hubs:error        value: 'The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM' },
  azure:event-hubs:error      Typed { type: [Object], value: null } ] } +3s
  azure:event-hubs:error [connection-1] connection.close() was not called from the sdk and there were some sender or receiver links or both. We should reconnect. +0ms
  rhea-promise:error [connection-1] Error occurred while establishing amqp connection: c {
  rhea-promise:error   value: 
  rhea-promise:error    [ Typed { type: [Object], value: 'amqp:connection:forced' },
  rhea-promise:error      Typed {
  rhea-promise:error        type: [Object],
  rhea-promise:error        value: 'The connection was inactive for more than the allowed 300000 milliseconds and is closed by container \'LinkTracker\'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM' },
  rhea-promise:error      Typed { type: [Object], value: null } ] } +5m
  azure:amqp-common:error [connection-1] An error occured while establishing the cbs links: { ConnectionForcedError: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:b21e84f9bd65496a9998a8553c60e7ca_G13, SystemTracker:gateway5, Timestamp:8/10/2018 8:50:06 PM
  azure:amqp-common:error     at Object.translate (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/lib/amqp-common/errors.ts:454:19)
  azure:amqp-common:error     at CbsClient.init (/Users/amarz/sdk/rh/final/azure-event-hubs-node/client/lib/amqp-common/cbs.ts:98:13)
  azure:amqp-common:error     at <anonymous>
  azure:amqp-common:error     at process._tickDomainCallback (internal/process/next_tick.js:228:7)
  azure:amqp-common:error   name: 'ConnectionForcedError',
  azure:amqp-common:error   translated: true,
  azure:amqp-common:error   retryable: true,
  azure:amqp-common:error   info: null,
  azure:amqp-common:error   condition: 'amqp:connection:forced' } +11s

I would expect the following frame from the service or atleast a frame with the connection:forced error. However, the disconnected event is emitted. Which makes me wonder whether the open frame was ever sent?

rhea:raw [connection-1] RECV: 71 0000004702000000005310c03a0aa12462323165383466396264363534393661393939386138353533633630653763615f473133407000010000601387700003a9804040404040 +152ms
  rhea:frames [connection-1]:0 <- open#10 {"container_id":"b21e84f9bd65496a9998a8553c60e7ca_G13","max_frame_size":65536,"channel_max":4999,"idle_time_out":240000}  +77ms
  rhea:events [connection-1] Connection got event: connection_open +0ms
grs commented 6 years ago

Try with https://github.com/amqp/rhea/pull/119

It looks as if the server may be disconnecting you after the sasl exchange or on the sending of the open. The error state is stale from the previous use of the connection (library managed reconnect resets the state but connect() does not).

amarzavery commented 6 years ago

Still getting the disconnected event. However, this time the stale error is not there. I still do not see any receive frame in the log corresponding to the #open10 sent on the wire. Not sure what is triggering the disconnected event.

  azure:event-hubs:linkEntity [connection-1] Acquiring cbs lock: 'negotiateCbs-39d039ed-4f61-40b9-9098-e20f65340b19' for creating the cbs session while creating the BatchingReceiver: '05ec164a-682e-43d2-b8b1-0fce5ace05a1' with address: 'test1/ConsumerGroups/$default/Partitions/0'. +5m
  azure:amqp-common:cbs The CBS client is trying to establish an AMQP connection. +5m
  rhea:io [connection-1] connected 10.104.141.137:58032 -> 52.226.36.235:5671 +12s
  rhea:frames [connection-1] -> { protocol_id: 3, major: 1, minor: 0, revision: 0 } +12s
  rhea:io [connection-1] read 8 bytes +258ms
  rhea:frames [connection-1] <- { protocol_id: 3, major: 1, minor: 0, revision: 0 } +258ms
  rhea:io [connection-1] read 63 bytes +77ms
  rhea:io [connection-1] got frame of size 63 +0ms
  rhea:raw [connection-1] RECV: 63 0000003f02010000005340c03201e02f04b3000000074d53534243425300000005504c41494e00000009414e4f4e594d4f55530000000845585445524e414c +12s
  rhea:frames [connection-1]:undefined <- sasl_mechanisms#40 {"sasl_server_mechanisms":["MSSBCBS","PLAIN","ANONYMOUS","EXTERNAL"]}  +78ms
  rhea:frames [connection-1]:0 -> function Typed(type, value, code, descriptor) {
    this.type = type;
    this.value = value;
    if (code) {
        this.array_constructor = {'typecode':code};
        if (descriptor) {
            this.array_constructor.descriptor = descriptor;
        }
    }
} ["ANONYMOUS",{"type":"Buffer","data":[0,82,111,111,116,77,97,110,97,103,101,83,104,97,114,101,100,65,99,99,101,115,115,75,101,121]},"eastusjs.servicebus.windows.net"]  +1ms
  rhea:raw [connection-1] SENT: 92 0000005c02010000005341d00000004c00000003a309414e4f4e594d4f5553a01a00526f6f744d616e6167655368617265644163636573734b6579a11f6561737475736a732e736572766963656275732e77696e646f77732e6e6574 +1ms
  rhea:io [connection-1] read 26 bytes +78ms
  rhea:io [connection-1] got frame of size 26 +0ms
  rhea:raw [connection-1] RECV: 26 0000001a02010000005344c00d025000a00857656c636f6d6521 +76ms
  rhea:frames [connection-1]:undefined <- sasl_outcome#44 {"additional_data":{"type":"Buffer","data":[87,101,108,99,111,109,101,33]}}  +76ms
  rhea:frames [connection-1]:0 -> open#10 {"transport":"tls","host":"eastusjs.servicebus.windows.net","username":"RootManageSharedAccessKey","port":5671,"servername":"eastusjs.servicebus.windows.net","id":"connection-1","container_id":"e5c93b54-8c9d-db40-9ace-7602b5f4a131","hostname":"eastusjs.servicebus.windows.net","properties":{"product":"MSJSClient","version":"0.2.6","platform":"(x64-Darwin-17.7.0)","framework":"Node/v8.11.2","user-agent":"/js-event-hubs"}}  +1ms
  rhea:raw [connection-1] SENT: 228 000000e402000000005310d0000000d40000000aa12465356339336235342d386339642d646234302d396163652d373630326235663461313331a11f6561737475736a732e736572766963656275732e77696e646f77732e6e657440404040404040d10000007d0000000aa30770726f64756374a10a4d534a53436c69656e74a30776657273696f6ea105302e322e36a308706c6174666f726da113287836342d44617277696e2d31372e372e3029a3096672616d65776f726ba10c4e6f64652f76382e31312e32a30a757365722d6167656e74a10e2f6a732d6576656e742d68756273 +1ms
  rhea:events [connection-1] Connection got event: disconnected +11s