Azure / azure-event-hubs-node

Node client library for Azure Event Hubs https://azure.microsoft.com/services/event-hubs
MIT License
50 stars 46 forks source link

Client crashing with Cannot read property 'options' of undefined #171

Closed MatejSkrbis closed 5 years ago

MatejSkrbis commented 5 years ago

After running for some time, the application randomly crashes with error:

TypeError: Cannot read property 'options' of undefined at Object.translate (...\node_modules\rhea-promise\lib\eventContext.ts:97:49) at emit (...\node_modules\rhea-promise\lib\util\utils.ts:184:20) at Object.emitEvent (...\node_modules\rhea-promise\lib\util\utils.ts:195:5) at Connection._connection.on (...\node_modules\rhea-promise\lib\connection.ts:506:10) at emitOne (events.js:116:13) at Connection.emit (events.js:211:7) at Connection.dispatch (...\node_modules\rhea\lib\connection.js:229:37) at Connection.input (...\node_modules\rhea\lib\connection.js:502:18) at emitOne (events.js:116:13) at TLSSocket.emit (events.js:211:7)

It seems as 'protocol_error' (eventName = 'protocolError') event is emitted and received at https://github.com/amqp/rhea-promise/blob/master/lib/connection.ts#L498 context is then set to: { message:"attach received on invalid channel 2" name:"ProtocolError" }

It seems it's much easier to reproduce this issue under high load when reading or writing from many partitions. With low load it can take days or more to get this error.

Package version: "@azure/event-hubs": "1.0.7"

amarzavery commented 5 years ago

So I have fixed the options of undefined error in the latest version of rhea-promise (0.1.9). Source code change in the repo is over here.

What worries me is this error

message:"attach received on invalid channel 2"
name:"ProtocolError"

It seems the sdk is trying to attach to an already closed session. Can you provide me some error logs by setting the DEBUG variable to:

export DEBUG=azure:event-hubs:error,azure-amqp-common:error,rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow

That will provide me some context on the sequence of events that happened before this error occurred.

MatejSkrbis commented 5 years ago

I'm not sure if I can provide this. What I'm actually doing is reading data from partition and processing it. If I find out that processing is too slow, I stop the receiver (haven't found any throttling options to slow down receive). And then when load is processed I start the receiver again.

When I stop the receiver the debug option seems to stop outputting to the console.

I also cannot reliably reproduce this issue (on one day it crashes almost every run on high load and on other day it doesn't crash at all. I have no idea why and code was not changed between these two days)

I will continue trying to get the sequence if there will be any luck getting it.

MatejSkrbis commented 5 years ago

I left client running over weekend and found out this error when I came back.

Error: Unhandled "error" event. ([object Object])
    at Connection.emit (events.js:186:19)
    at emit (...\node_modules\rhea-promise\lib\util\utils.ts:183:20)
    at Object.emitEvent (...\node_modules\rhea-promise\lib\util\utils.ts:195:5)
    at Connection._connection.on (...\node_modules\rhea-promise\lib\connection.ts:505:9)
    at emitOne (events.js:116:13)
    at Connection.emit (events.js:211:7)
    at Connection.dispatch (...\node_modules\rhea\lib\connection.js:229:37)
    at Connection.input (...\node_modules\rhea\lib\connection.js:505:18)
    at emitOne (events.js:116:13)
    at TLSSocket.emit (events.js:211:7)

This error object also had context property with it's own stack

Error: transfer after detach
    at Incoming.on_transfer (...\node_modules\rhea\lib\session.js:360:15)
    at Session.on_transfer (...\node_modules\rhea\lib\session.js:730:19)
    at Connection.(anonymous function) [as on_transfer] (...\node_modules\rhea\lib\connection.js:734:30)
    at c.dispatch (...\node_modules\rhea\lib\types.js:909:33)
    at Transport.read (...\node_modules\rhea\lib\transport.js:108:36)
    at SaslClient.read (...\node_modules\rhea\lib\sasl.js:293:26)
    at Connection.input (...\node_modules\rhea\lib\connection.js:488:35)
    at emitOne (events.js:116:13)
    at TLSSocket.emit (events.js:211:7)
    at addChunk (_stream_readable.js:263:12)

I had env set to debug=azure:event-hubs:error,azure-amqp-common:error,rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow

But messages were printed to console only for first few minutes and then they stopped, so there is no any other useful information I can provide.

amarzavery commented 5 years ago

@MatejSkrbis - That is some useful information. Looks like sdk received some messages after the receiver link was detached. Hence we see this error. Are you using the basic event hubs client or are you using the event processor host? Will try to reproduce this error. @grs - Any idea what can cause the transfer after detach error as mentioned above?

grs commented 5 years ago

@amarzavery it is raised when a transfer is received on a link that the peer already issued a detach for

amarzavery commented 5 years ago

@grs - To confirm my understanding of the above statement.

It looks like the service issued a detach for a receiver link and then sent a message on that link.

Is the above statement correct? If so, then this looks like a service side issue and I can start looping folks from the service team.

MatejSkrbis commented 5 years ago

@amarzavery I am using event hubs client. Event processor host was less stable and was crashing much more often so I switched back to event hub clients.

grs commented 5 years ago

@amarzavery yes, that is what it looks like

ramya-rao-a commented 5 years ago

@MatejSkrbis Have you see this "transfer after detach" error ever since? Since it was concluded that we might need to loop in the service team here, it would help if we know if this has happened again

MatejSkrbis commented 5 years ago

I don't remember getting this error since I updated to "@azure/event-hubs": "1.0.8". I'm not sure if it was a rhea thing or not, but I am not getting this anymore.

ramya-rao-a commented 5 years ago

Thanks @MatejSkrbis I'll close this issue for now then.

If you see this error ("transfer after detach") again, please do log an issue.