efrecon / tcl-stomp

A STOMP server and client library in Tcl
ISC License
1 stars 0 forks source link

Implement publisher confirms #3

Open ericsium opened 9 years ago

ericsium commented 9 years ago

This is a general question about reconnections, my feeling is that reconnections should be pretty much transparent to user code. I've been playing around with the heartbeat feature as I'm been having issues with defunct RabbitMQ channels that don't get deleted. The problem is that occasionally operations can be executed on the server that can take up to 30 minutes to hours that basically insure that heartbeats will cause the connection to be dropped during this long processing time (The operation happens to be blocking - nothing I can do about this so heartbeat code doesn't get a chance to execute). Since it's an RPC based system a reply-to field is set and after the long operation is complete a response is send back to the original producer.

In this example, you can see initial subscription and receipt of message. Ten seconds later the reply is generated. In this case I've ensured that hearbeast causes the connection to die. When the response it sent an error occurs since the connection no longer exists.

The reconnect logic then kicks in to recover the connection,

Since I register my subscriptions like so, we reconnect and the subscription is recovered.

Register init callback to occur when we connect to server

::stomp::client::handler $CLIENT ::emuRegrChain::_init CONNECTED

However, there is not a mechanism to 'retry' the failed 'send' command so it is basically dropped.

I think what should happen here is that the connection should basically recover to the state it was in before the connection was dropped (including resubscribes) and retry the failed operation. If you do auto reconnect of subscriptions another liveness state 'SUBSCRIBE' might be needed that would only execute once after first connection so initial subscriptions could be registered.

https://www.rabbitmq.com/reliability.html

[20150617 164359] Sending SUBSCRIBE [20150617 164359] id:subscription-0011209-0199481 [20150617 164359] destination:/queue/chain_test_dev_1 [20150617 164359] ack:auto [20150617 164424] Receive MESSAGE [20150617 164424] correlation-id:d2af3861-94a2-4beb-a6bb-a35a77803911 [20150617 164424] reply-to:/reply-queue/amq.rabbitmq.reply-to.g2dkABFyYWJiaXRAYXRsaHdlLXMxMAAAfN4AAAATAQ==.bpA+KkIWhEzRkPJCSmaWrA== [20150617 164424] timestamp:1434573864 [20150617 164424] persistent:1 [20150617 164424] user-id:ewwhite_mon [20150617 164424] subscription:subscription-0011209-0199481 [20150617 164424] content-length:173 [20150617 164424] content-type:application/octet-stream [20150617 164424] priority:1 [20150617 164424] destination:/queue/chain_test_dev_1 [20150617 164424] message-id:T_subscription-0011209-0199481@@session-iPJHbno1tWj3YoQBEysIqw@@1 [20150617 164424] Passing message further to subscription handler [20150617 164434] Sending SEND [20150617 164434] destination:/reply-queue/amq.rabbitmq.reply-to.g2dkABFyYWJiaXRAYXRsaHdlLXMxMAAAfN4AAAATAQ==.bpA+KkIWhEzRkPJCSmaWrA== [20150617 164434] correlation-id:d2af3861-94a2-4beb-a6bb-a35a77803911 [20150617 164434] content-length:202 [20150617 164434] Error when reading from server: EOF reached, connection lost

[20150617 164434] Closing connection to atlhwe-s10:61613 [20150617 164434] Reconnecting to atlhwe-s10:61613 in 1000 ms. [20150617 164435] Sending CONNECT [20150617 164435] login:ewwhite_mon [20150617 164435] passcode:guest [20150617 164435] host:/zn/a [20150617 164435] heart-beat:1000,100 [20150617 164435] accept-version:1.0,1.1,1.2 [20150617 164435] Receive CONNECTED [20150617 164435] session:session-_CK5RaWAwf3m86cYo6xiKA [20150617 164435] heart-beat:1000,1000 [20150617 164435] version:1.2 [20150617 164435] server:RabbitMQ/3.4.4 [20150617 164435] Heart-beat to server is 1000 ms. [20150617 164435] Heart-beat from server is 1000 ms. [20150617 164435] Passing incoming STOMP command CONNECTED further to handler ::emuRegrChain::_init

ericsium commented 9 years ago

I think what I'm actually asking for here is support for publisher confirms. So transactions would have to be acked by server. Current implementation seems to be fine (dropping message). If disconnection occurs and confirmst were requested unacked messages would need to be resent.

https://www.rabbitmq.com/confirms.html#when

From my perspective at least this would be a low priority enhancement.