Closed gabrielSoudry closed 1 year ago
/cc @jeremymeng
This error, "New receiver 'nil' with higher epoch of '0' is created hence current receiver 'nil' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used", most commonly happens when a consumer tries to read from a partition that is already being read from from another consumer. For more details, checkout the troubleshooting guide. Generally speaking when such an error occurs, the consumer should be restarted and subscribe to a different set of partitions.
We are facing exact same issue and because of this the production drop is blocked since a month.
Has anyone aware of any work around? May be keeping some of the dependecies on previous version or so? @deyaaeldeen
NVM Got work around for now
One can resolve the issue by keeping below version configured under package.json "@azure/service-bus": "7.8.1", "@azure/core-amqp": "3.2.2", "@azure/event-hubs": "5.8.0",
thank you @KulkarniSiddhesh for the extra information, we will continue to investigate.
/cc @HarshaNalluru
@KulkarniSiddhesh thanks for the report! I have some questions:
@azure/event-hubs
5.9.0 also have the same issue thus you downgrade to 5.8.0?@azure/service-bus
matter? That is, if you use @azure/service-bus
7.9.0 and @azure/event-hubs
5.8.0, will the issue happen?I tried to use with no success, same behavior ownership of partitions keep changing without any reasons. "@azure/core-amqp": "3.2.2", "@azure/event-hubs": "5.8.0", We don't have "@azure/service-bus": "7.8.1",
Perhaps add more meaningful logs in debug mode of "why" my consumer lose his partitions, and add a log when the other consumer starts to get the lock of a partition he didn't have before
@jeremymeng Yes, it worked with the combination of the versions for me.
@gabrielSoudry, here is link for 7.8.1 of @azure/service-bus https://www.npmjs.com/package/@azure/service-bus/v/7.8.1
We recently released version 5.11.1
of @azure/event-hubs
to address an important bug in retrieving updated authentication tokens: https://www.npmjs.com/package/@azure/event-hubs/v/5.11.1
I'm not sure that it will totally resolve the issue in this thread, but it should at least help with narrowing the cause should it continue to reproduce. Can you give this version a try?
@jeremymeng Yes, it worked with the combination of the versions for me.
@KulkarniSiddhesh Sorry it is not clear to me whether this is answer to Question 1 or Question 2, or both.
@KulkarniSiddhesh could you please clarify?
@gabrielSoudry could you please confirm if the issue has been resolved with v5.11.1?
Perhaps add more meaningful logs in debug mode of "why" my consumer lose his partitions, and add a log when the other consumer starts to get the lock of a partition he didn't have before
The why part is pretty much due to scaling or if an existing consumer dies. The SDK doesn't know why a partition becomes available, it just pings the checkpoint store to learn about the current state of the world. Please notice that a more involved orchestration can be costly. Furthermore, we already log when a consumer takes over a partition.
Hi @gabrielSoudry. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Great news :) By updating Azure identity lib + move to Azure Workload Identity + updating to 5.11.1 we no longer have the problem, I'll wait another day to reconfirm and you can close the ticket.
Hi @gabrielSoudry. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.
Describe the bug Our pods containing our workers that consume messages restart after a certain time (randomly every 8 hours or so) and do not try to reconnect to the event hub. Similar to https://github.com/Azure/azure-sdk-for-js/issues/15893, the ownership of partitions keep changing even if every application instance is running 24*7, and the pod restart instead of trying to establish a reconnection.
To Reproduce Steps to reproduce the behavior:
Expected behavior
Try to reconnect and not crashing