elexon-data / iris-clients

Clients for connecting to the Insights Real-time Information Service
MIT License
16 stars 7 forks source link

Sporadic ServiceBusError when launching the client #9

Closed Samreay closed 5 months ago

Samreay commented 1 year ago

Hi team,

It looks like the provided example is not robust.

Often it runs for a while, downloading data as expected, but then it will eventually crash with The lock on the message lock has expired

INFO:azure.identity._internal.get_token_mixin:ClientSecretCredential.get_token succeeded
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.START: 0> to <ConnectionState.START: 0>
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.START: 0> to <ConnectionState.HDR_SENT: 2>
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.HDR_SENT: 2> to <ConnectionState.HDR_EXCH: 3>
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.HDR_EXCH: 3> to <ConnectionState.OPEN_SENT: 7>
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.OPEN_SENT: 7> to <ConnectionState.OPENED: 9>
INFO:uamqp.c_uamqp:CBS for connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' completed opening with status: 1
INFO:uamqp.c_uamqp:Token put complete with result: 1, status: 202, description: b'Accepted', connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'
INFO:uamqp.receiver:Message receiver b'receiver-link-38d7df10-369c-443d-b2a2-992bb24be28c' state changed from <MessageReceiverState.Idle: 1> to <MessageReceiverState.Opening: 2> on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'
INFO:uamqp.receiver:Message receiver b'receiver-link-38d7df10-369c-443d-b2a2-992bb24be28c' state changed from <MessageReceiverState.Opening: 2> to <MessageReceiverState.Open: 3> on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'
INFO:root:Downloading data to ./data/BOALF/BOALF_230314T193850_559861.json
INFO:root:Downloading data to ./data/RURE/RURE_230314T193851_155518.json
INFO:root:Downloading data to ./data/RURE/RURE_230314T193851_747757.json
INFO:root:Downloading data to ./data/NDF/NDF_230314T193852_342917.json
INFO:root:Downloading data to ./data/IMBALNGC/IMBALNGC_230314T193853_760876.json
INFO:root:Downloading data to ./data/TSDF/TSDF_230314T193854_637336.json
INFO:root:Downloading data to ./data/INDGEN/INDGEN_230314T193856_938570.json
INFO:root:Downloading data to ./data/INDDEM/INDDEM_230314T193858_698978.json
INFO:root:Downloading data to ./data/BOALF/BOALF_230314T193859_317926.json
INFO:root:Downloading data to ./data/FUELINST/FUELINST_230314T193859_960078.json
INFO:root:Downloading data to ./data/FREQ/FREQ_230314T193900_595458.json
INFO:root:Downloading data to ./data/MELNGC/MELNGC_230314T193902_275859.json
INFO:root:Downloading data to ./data/BOALF/BOALF_230314T193902_889906.json
INFO:root:Downloading data to ./data/UOU2T3YW/UOU2T3YW_230314T194047_138218.json
INFO:uamqp.receiver:Message receiver b'receiver-link-38d7df10-369c-443d-b2a2-992bb24be28c' state changed from <MessageReceiverState.Open: 3> to <MessageReceiverState.Closing: 4> on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'
INFO:uamqp.connection:Shutting down connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'.
INFO:uamqp.authentication.cbs_auth:Shutting down CBS session on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'.
INFO:uamqp.authentication.cbs_auth:Auth closed, destroying session on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'.
INFO:uamqp.authentication.cbs_auth:Finished shutting down CBS session on connection: b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'.
INFO:uamqp.connection:Connection b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980' state changed from <ConnectionState.OPENED: 9> to <ConnectionState.END: 13>
INFO:uamqp.connection:Connection shutdown complete b'SBReceiver-e6240293-4776-4aae-ae29-c71e08474980'.
Traceback (most recent call last):
  File "/Users/sh/Projects/iris-clients/client.py", line 67, in run
    receiver.complete_message(msg)
  File "/Users/sh/miniconda3/envs/iris/lib/python3.9/site-packages/azure/servicebus/_servicebus_receiver.py", line 853, in complete_message
    self._settle_message_with_retry(message, MESSAGE_COMPLETE)
  File "/Users/sh/miniconda3/envs/iris/lib/python3.9/site-packages/azure/servicebus/_servicebus_receiver.py", line 490, in _settle_message_with_retry
    raise ServiceBusError(
azure.servicebus.exceptions.ServiceBusError: The lock on the message lock has expired.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sh/Projects/iris-clients/client.py", line 75, in <module>
    run()
  File "/Users/sh/Projects/iris-clients/client.py", line 70, in run
    receiver.abandon_message(msg)
  File "/Users/sh/miniconda3/envs/iris/lib/python3.9/site-packages/azure/servicebus/_servicebus_receiver.py", line 877, in abandon_message
    self._settle_message_with_retry(message, MESSAGE_ABANDON)
  File "/Users/sh/miniconda3/envs/iris/lib/python3.9/site-packages/azure/servicebus/_servicebus_receiver.py", line 490, in _settle_message_with_retry
    raise ServiceBusError(
azure.servicebus.exceptions.ServiceBusError: The lock on the message lock has expired.

To expand on this, I added some debug logging inside the retrieved and got this:

Message lock has was until 2023-03-14 10:33:41.366000+00:00 and now is 2023-03-14 10:33:49.760031+00:00

So it seems the message is locked until 2023-03-14 10:33:41.3, but its received seconds after this. Im curious as to why the expiry is so short? I have no idea if this is a parameter Azure sets or some part of the message definition, I'm brand new to this style of message brokering.

How to reproduce

Run the python client.py, and wait between 10 seconds and 5 mintues.

Things Ive tried

Hoped that an AutoLockRenewer with multiple workers would help, but nope. What I added below.

    receiver = client.get_queue_receiver(
        settings.QueueName,
        max_wait_time=300,
        auto_lock_renewer=AutoLockRenewer(max_lock_renewal_duration=300, max_workers=4),
    )

Also tried and commented out the save_message part, just in case it was the processing time of saving to disk that took us past expiry, but this also didn't help. Even returning the complete_message instantly after receiving still ets that ServiceBusError: The lock on the message lock has expired.`

I beleive I can stop the errors by changing the receive mode, adding to get_queue_receiver:

receive_mode=ServiceBusReceiveMode.RECEIVE_AND_DELETE,

And then commenting out the .complete_message and .abandon_message, but obviously this isn't ideal because half the point of the messaging service is being able to fail but still receive undigested messages on a different queue. Removing resiliency to just get the client working stably doesnt seem like a good idea.

ZakCannon commented 5 months ago

This issue is being closed in favour of #16