MetPX / sarracenia

https://MetPX.github.io/sarracenia
GNU General Public License v2.0
44 stars 22 forks source link

Some configs fail to connect to the broker when it gets restarted #896

Open andreleblanc11 opened 5 months ago

andreleblanc11 commented 5 months ago

Encountered during auto-patching, when both brokers got rebooted on stage.

An instance, when trying to connect to the broker, may inadvertently get its AMQP connection restarted. The logs report back with an error message.

2024-01-15 16:28:28,851 [WARNING] sr_amqp/consume: could not consume in queue q_feeder.sr_sarra.$CONFIG.ddsr-stage-shared: Basic.ack: (406) PREC
ONDITION_FAILED - unknown delivery tag 222
2024-01-15 16:28:28,851 [DEBUG] Exception details:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/sarra/sr_amqp.py", line 318, in consume
    msg = self.channel.basic_get(queuename)
  File "/usr/lib/python3/dist-packages/amqp/channel.py", line 1639, in basic_get
    wait=[spec.Basic.GetOk, spec.Basic.GetEmpty], returns_tuple=True,
  File "/usr/lib/python3/dist-packages/amqp/abstract_channel.py", line 59, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
  File "/usr/lib/python3/dist-packages/amqp/abstract_channel.py", line 79, in wait
    self.connection.drain_events(timeout=timeout)
  File "/usr/lib/python3/dist-packages/amqp/connection.py", line 471, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/lib/python3/dist-packages/amqp/connection.py", line 477, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/lib/python3/dist-packages/amqp/method_framing.py", line 55, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/lib/python3/dist-packages/amqp/connection.py", line 481, in on_inbound_method
    method_sig, payload, content,
  File "/usr/lib/python3/dist-packages/amqp/abstract_channel.py", line 128, in dispatch_method
    listener(*args)
  File "/usr/lib/python3/dist-packages/amqp/channel.py", line 279, in _on_close
    reply_code, reply_text, (class_id, method_id), ChannelError,
amqp.exceptions.PreconditionFailed: Basic.ack: (406) PRECONDITION_FAILED - unknown delivery tag 222
2024-01-15 16:28:28,852 [DEBUG] Closed channel #1
2024-01-15 16:28:28,852 [DEBUG] Closed channel #2
2024-01-15 16:28:28,852 [DEBUG] Closed channel #3
2024-01-15 16:28:28,852 [DEBUG] Closed channel #4
2024-01-15 16:28:28,852 [DEBUG] Connecting ddsr-stage.cmc.ec.gc.ca feeder (ssl False)
2024-01-15 16:28:28,852 [DEBUG] amqp://feeder:<pw>@ddsr-stage.cmc.ec.gc.ca/ ssl=False
2024-01-15 16:28:28,852 [INFO] Using amqp module (AMQP 0-9-1)
petersilva commented 5 months ago

This is weird... a delivery tag is associated with a message... it means there was a message 222 somewhere caught between the broker and the client, and the connection was torn down to get rid of it. So it indicates data loss of some sort.

petersilva commented 5 months ago

Normally the point of recovering a connection is to recover such messages... so ... who remembered what? is the question... the client doesn't have any state, so where would it pull 222 from? very strange.

petersilva commented 4 months ago

@andreleblanc11 do you remember what we did to recover from this?

reidsunderland commented 1 month ago

I moved this comment about the PRECONDITION FAILED unknown delivery tag problem to https://github.com/MetPX/sarracenia/issues/953#issuecomment-2129676332