Azure / azure-event-hubs-node

Node client library for Azure Event Hubs https://azure.microsoft.com/services/event-hubs
MIT License
50 stars 44 forks source link

Credits of the sender goes down to 0 #84

Closed LukaszBrygala closed 6 years ago

LukaszBrygala commented 6 years ago

Hello,

I'm using the package to send data to Azure Event Hubs. After turning on debug messages I've observed that our sender consumes all the credits it received and remains in that state - it never receives more credits and can't send more messages.

Is this a problem with the package, or I've missed something in configuration? I'm also wondering if this can be related to the way Event Hubs is used from receiver side (which I personally don't have access to).

Thanks for any explanations you can provide.

amarzavery commented 6 years ago

@LukaszBrygala - You need to await on sending the message. Can you share the snippet of code for the sender?

LukaszBrygala commented 6 years ago

@amarzavery - We are awaiting the sendBatch method. See snippet below.

I've observed that sometimes the amount of credits increases - especially after dropping from 300 to 200, it goes back to 300 again. But after ~2 days of working it reaches 0 and remains like this.

const amqp = require("amqplib");
const { EventHubClient } = require("azure-event-hubs");
const prepareMessages = require("./prepareMessages");

let batch = [];

async function sendBatch(sender) {
  const toSend = batch.slice();
  batch.length = 0;
  await sender.sendBatch(toSend);
}

async function consumePayload(payload, sender) {
  const messages = prepareMessages(payload);
  if (!messages) {
    return;
  }
  batch = [...batch, ...messages];
  if (batch.length < 100) {
    return;
  }
  await sendBatch(sender);
}

async function startMessagingServer(log, config) {
  const { CONNECTION_STRING, PATH, AMQP_URL } = config;

  try {
    const sender = EventHubClient.createFromConnectionString(CONNECTION_STRING, PATH);
    const connection = await amqp.connect(AMQP_URL);
    const channel = await connection.createChannel();
    channel.consume("q", async payload => {
      try {
        await consumePayload(payload, sender);
      } catch (err) {
        log.error(err, "Message handle error");
      }
      channel.ack(payload);
    });
  } catch (err) {
    log.error(err, "Error starting messaging server");
    process.exit(1);
  }
}

module.exports = startMessagingServer;
amarzavery commented 6 years ago

Is there a reason to use the amqplib ?

Can you turn on the debug logs: export DEBUG=azure*,rhea*? That will help us understand what is going on..

We let the amqp library (rhea) manage the credits. It will replenish the credits automatically as the sent messages are settled.

But after ~2 days of working it reaches 0 and remains like this.

I suspect, the messages may not be getting settled. Hence the credits may not be replenished. Debug logs will help us get a clear picture.

LukaszBrygala commented 6 years ago

I am using amqplib to consume input from RabbitMQ which is then forwarded in the batches to Event Hubs. I will try to add rhea* to logs and let you know what I will get. Expect some news after couple of days. Thanks for your help!

LukaszBrygala commented 6 years ago

(Following logs are "grepped" like this: grep -v "rhea:message\|rhea:frames\|datatransformer" | grep "rhea\|azure:event-hubs".)

For some time it looks all right with logs like this:

2018-06-21T06:09:41.631Z azure:event-hubs:sender [connection-1] Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', trying to send EventData[]: [ { body:
2018-06-21T06:09:41.644Z azure:event-hubs:sender [connection-1]Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', sending encoded batch message. <Buffer 00 ... >
2018-06-21T06:09:41.645Z azure:event-hubs:sender [connection-1] Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', credit: 241 available: 2048
2018-06-21T06:09:41.645Z azure:event-hubs:sender [connection-1] Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', sending message: <Buffer 00 ... >
2018-06-21T06:09:41.645Z azure:event-hubs:sender [connection-1] Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', sent message with delivery id: 959
2018-06-21T06:09:41.645Z azure:event-hubs:retry Retry attempt number: 1
2018-06-21T06:09:41.646Z rhea:raw [connection-1] SENT: <Buffer 00 ... >
2018-06-21T06:09:41.663Z rhea:io [connection-1] read 27 bytes
2018-06-21T06:09:41.663Z rhea:io [connection-1] got frame of size 27
2018-06-21T06:09:41.664Z rhea:events Received disposition for outgoing transfers
2018-06-21T06:09:41.664Z rhea:events Link got event: accepted
2018-06-21T06:09:41.664Z azure:event-hubs:sender [connection-1] Sender 'a35e994a-b4d9-44d7-9ed1-75ecbb6403e3', got event accepted.
2018-06-21T06:09:41.664Z rhea:events Link got event: settled
2018-06-21T06:09:41.664Z rhea:events Session got event: settled
2018-06-21T06:09:41.664Z rhea:events Connection got event: settled
2018-06-21T06:09:41.664Z rhea:events Container got event: settled
2018-06-21T06:09:41.664Z azure:event-hubs:retry Success, after attempt number: 1.

Then something interesting (success result and partition undefined?):

2018-06-21T06:26:26.730Z azure:event-hubs:rpc [connection-1] $cbs response:  {"correlation_id":"64708640-a761-44a0-9dc4-c26072d75548","application_properties":{"status-code":202,"status-description":"Accepted"}}
2018-06-21T06:26:26.730Z azure:event-hubs:rpc [connection-1] request-messageId | '64708640-a761-44a0-9dc4-c26072d75548' == '64708640-a761-44a0-9dc4-c26072d75548' | response-correlationId.
2018-06-21T06:26:26.730Z azure:event-hubs:retry Success, after attempt number: 1.
2018-06-21T06:26:26.730Z azure:event-hubs:retry Success result: undefined
2018-06-21T06:26:26.731Z rhea:raw [connection-1] SENT: <Buffer 00 00 00 1e 02 00 00 00 00 53 15 d0 00 00 00 0e 00 00 00 05 41 52 17 52 17 41 00 53 24 45>
2018-06-21T06:26:26.731Z azure:event-hubs:sender [connection-1] Negotiated claim for sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c' with with partition: undefined
2018-06-21T06:26:26.731Z azure:event-hubs:sender [connection-1]Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', has next token renewal in 2700 seconds @(Thu Jun 21 2018 07:11:26 GMT+0000 (UTC)).

After some time I observe closing events:

2018-06-21T06:30:47.496Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', trying to send EventData[]: [ { body:
2018-06-21T06:30:47.507Z azure:event-hubs:sender [connection-1]Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', sending encoded batch message. <Buffer 00 ... >
2018-06-21T06:30:47.507Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', credit: 218 available: 2048
2018-06-21T06:30:47.507Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', sending message: <Buffer 00 ... >
2018-06-21T06:30:47.507Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', sent message with delivery id: 982
2018-06-21T06:30:47.507Z azure:event-hubs:retry Retry attempt number: 1
2018-06-21T06:30:47.509Z rhea:raw [connection-1] SENT: <Buffer 00 ... >
2018-06-21T06:30:47.519Z rhea:io [connection-1] read 27 bytes
2018-06-21T06:30:47.519Z rhea:io [connection-1] got frame of size 27
2018-06-21T06:30:47.519Z rhea:events Received disposition for outgoing transfers
2018-06-21T06:30:47.520Z rhea:events Link got event: accepted
2018-06-21T06:30:47.520Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', got event accepted.
2018-06-21T06:30:47.520Z rhea:events Link got event: settled
2018-06-21T06:30:47.520Z rhea:events Session got event: settled
2018-06-21T06:30:47.520Z rhea:events Connection got event: settled
2018-06-21T06:30:47.520Z rhea:events Container got event: settled
2018-06-21T06:30:47.520Z azure:event-hubs:retry Success, after attempt number: 1.
2018-06-21T06:31:37.239Z rhea:io [connection-1] read 17 bytes
2018-06-21T06:31:37.239Z rhea:io [connection-1] got frame of size 17
2018-06-21T06:31:37.240Z rhea:events Link got event: sender_close
2018-06-21T06:31:37.240Z rhea:events Session got event: sender_close
2018-06-21T06:31:37.240Z rhea:events Connection got event: sender_close
2018-06-21T06:31:37.240Z rhea:events Container got event: sender_close
2018-06-21T06:31:37.240Z rhea:io [connection-1] read 80 bytes
2018-06-21T06:31:37.240Z rhea:io [connection-1] got frame of size 18
2018-06-21T06:31:37.240Z rhea:events Link got event: receiver_close
2018-06-21T06:31:37.240Z rhea:events Session got event: receiver_close
2018-06-21T06:31:37.240Z rhea:events Connection got event: receiver_close
2018-06-21T06:31:37.240Z rhea:events Container got event: receiver_close
2018-06-21T06:31:37.240Z rhea:io [connection-1] got frame of size 15
2018-06-21T06:31:37.241Z rhea:events Session got event: session_close
2018-06-21T06:31:37.241Z rhea:events Connection got event: session_close
2018-06-21T06:31:37.242Z rhea:events Container got event: session_close
2018-06-21T06:31:37.242Z rhea:io [connection-1] got frame of size 17
2018-06-21T06:31:37.242Z rhea:events Link got event: sender_close
2018-06-21T06:31:37.242Z rhea:events Session got event: sender_close
2018-06-21T06:31:37.242Z rhea:events Connection got event: sender_close
2018-06-21T06:31:37.242Z rhea:events Container got event: sender_close
2018-06-21T06:31:37.242Z rhea:io [connection-1] got frame of size 15
2018-06-21T06:31:37.242Z rhea:events Session got event: session_close
2018-06-21T06:31:37.242Z rhea:events Connection got event: session_close
2018-06-21T06:31:37.242Z rhea:events Container got event: session_close
2018-06-21T06:31:37.242Z rhea:io [connection-1] got frame of size 15
2018-06-21T06:31:37.243Z rhea:events Connection got event: connection_close
2018-06-21T06:31:37.243Z rhea:events Container got event: connection_close
2018-06-21T06:31:37.243Z rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 17 45>
2018-06-21T06:31:37.243Z rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 01 00 53 17 45>
2018-06-21T06:31:37.243Z rhea:raw [connection-1] SENT: <Buffer 00 00 00 0c 02 00 00 00 00 53 18 45>

After some time:

2018-06-21T07:11:26.732Z azure:event-hubs:sender [connection-1] Acquiring lock: 'negotiateCbs-e2fa05e4-606c-4b44-88e2-1b4914656788' for creating the cbs session while creating the sender: '2582cacc-a0e4-4ebb-b965-9b8dfac4300c'.
2018-06-21T07:11:26.732Z azure:event-hubs:cbs [connection-1] CBS session is already present. Reusing the cbs sender '7722ddc1-5387-e241-a5ca-06f0f57e8c4d' and receiver 'cbs-8d2a6313-42ab-413a-9c76-ecdf13109eaf' links over cbs session.
2018-06-21T07:11:26.732Z azure:event-hubs:sender [connection-1] EH Sender: calling negotiateClaim for audience 'sb://<...>.servicebus.windows.net/<...>'.
2018-06-21T07:11:26.732Z azure:event-hubs:sender [connection-1] Acquiring lock: 'negotiateClaim-e6be22f7-d8e3-4ec2-9f08-a041261d0942' for cbs auth for sender: '2582cacc-a0e4-4ebb-b965-9b8dfac4300c'.
2018-06-21T07:11:26.733Z azure:event-hubs:rpc [connection-1] $cbs request sent: { body: 'SharedAccessSignature sr=',
...
2018-06-21T07:11:36.735Z azure:event-hubs:retry Error occured in attempt number 1: { ServiceUnavailableError: The request with message_id "0a6398f7-0fb5-45ac-a6ec-05bb24251a7f" to "$cbs" endpoint timed out. Please try again later.
2018-06-21T07:11:36.735Z azure:event-hubs:retry Sleeping for 15 seconds.
2018-06-21T07:11:51.736Z azure:event-hubs:retry Retry attempt number: 2
2018-06-21T07:11:51.737Z azure:event-hubs:retry Error occured in attempt number 2: { ServiceUnavailableError: The request with message_id "0a6398f7-0fb5-45ac-a6ec-05bb24251a7f" to "$cbs" endpoint timed out. Please try again later.
2018-06-21T07:11:51.737Z azure:event-hubs:retry Sleeping for 15 seconds.
2018-06-21T07:12:06.737Z azure:event-hubs:retry Retry attempt number: 3
2018-06-21T07:12:06.738Z azure:event-hubs:retry Error occured in attempt number 3: { ServiceUnavailableError: The request with message_id "0a6398f7-0fb5-45ac-a6ec-05bb24251a7f" to "$cbs" endpoint timed out. Please try again later.
2018-06-21T07:12:06.738Z azure:event-hubs:retry Sleeping for 15 seconds.
2018-06-21T07:12:21.740Z azure:event-hubs:cbs [connection-1]An error occurred while negotating the cbs claim: { ServiceUnavailableError: The request with message_id "0a6398f7-0fb5-45ac-a6ec-05bb24251a7f" to "$cbs" endpoint timed out. Please try again later.
2018-06-21T07:12:21.740Z azure:event-hubs:sender [connection-1] Sender '2582cacc-a0e4-4ebb-b965-9b8dfac4300c', an error occurred while renewing the token: { ServiceUnavailableError: The request with message_id "0a6398f7-0fb5-45ac-a6ec-05bb24251a7f" to "$cbs" endpoint timed out. Please try again later.

Credits are going down, no logs from rhea now despite this:

2018-06-21T09:27:52.441Z rhea:frames [connection-1] PENDING: '{"channel":1,"type":0,"performative":[0,1148,{"type":"Buffer","data":[49,49,52,56]},2147563264,false,false],"payload":{"type":"Buffer","data":[0, ...]}}'
[connection-1] error: {"message":"write after end"}
2018-06-21T09:27:52.442Z rhea:raw [connection-1] SENT: <Buffer 00 ... >

It looks to me that I'm getting disconnected and there is no successful reconnection attempt after that. Probably I should react to some "close" event and try to perform reconnect in my code?

amarzavery commented 6 years ago

Yes, it looks like the connection went down due to network issues. rhea has automatic reconnect logic if the connection goes down. It looks like the connection was up again but somehow the cbs auth requests were taking lot of time and timed out. We need to do a better job at reconnecting.

@grs - Do you have any insights on what could be happening over here?

grs commented 6 years ago

From the logs it looks like the connection was closed explicitly, rather than being disconnected. It therefore does not reconnect.

amarzavery commented 6 years ago

It doesn't look like a call being made from the EH SDK for closing the sender. If it was the case then we would have seen log statements from the SDK or from the rhea-promise layer for closing the sender.

It looks like somehow rhea got the close events on sender, session, connection and the container.

grs commented 6 years ago

The close may have originated from the peer (i.e. the server).

amarzavery commented 6 years ago

@LukaszBrygala - azure-event-hubs@0.2.4 has been published with support for handling disconnects and retry logic. Please try the new version and let us know if you find any issues.

amarzavery commented 6 years ago

Sender recovery has been implemented. Hence closing this issue. Please feel free to reopen the issue or open another issue, if the problem persists.