Azure / azure-functions-rabbitmq-extension

RabbitMQ extension for Azure Functions
MIT License
48 stars 54 forks source link

RabbitMq Connection Reset Issue #159

Open abhishek-loyalytics opened 3 years ago

abhishek-loyalytics commented 3 years ago

Unhandled exception. RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=541, text='Unexpected Exception', classId=0, methodId=0, cause=System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer. ---> System.Net.Sockets.SocketException (104): Connection reset by peer at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- at RabbitMQ.Client.Impl.InboundFrame.ReadFrom(NetworkBinaryReader reader) at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration() at RabbitMQ.Client.Framing.Impl.Connection.MainLoop() at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd) at Microsoft.Azure.WebJobs.Extensions.RabbitMQ.RabbitMQListener.<>c__DisplayClass20_0.<b0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Threading.Tasks.Task.<>c.b139_1(Object state) at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

npalmius commented 1 year ago

We have been seeing this error too under quite specific conditions, and in case it is useful I thought that I would add my findings to this ticket.

We are running a Python (currently Python 3.7) function app host (v3.17.0) with RabbitMQ extension trigger (v2.0.3.0) running in an app service image (based on 3.0-python3.7-appservice) in Kubernetes.

In our case, what appeared to be happening was that our image didn't contain the pre-compiled python scripts (i.e. the __pycache__ directories) for our functions, so these were generated on first function execution. This would trigger a restart of the JobHost by the FileMonitoringService, which would stop the RabbitMQListener while the function was still executing. So when the function finished, the RabbitMQ connection had been disconnected, which triggered the error in the ticket.

It doesn't appear to be possible to disable the file-watching behaviour of the function app host, so we think that we have resolved this by pre-compiling our python functions so that we don't trigger the JobHost restart. We'll be monitoring it to see if we still see this error.

IMO, it is unclear where the fault is here - should the function app host be restarting running functions? Or can the RabbitMQ extension somehow handle this?

There isn't much detail in the original ticket here, but just wanted to add this in case it is relevant.