Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.09k stars 1.2k forks source link

[Service Bus] Investigate performance of Service Bus API #4704

Closed ramya0820 closed 5 years ago

ramya0820 commented 5 years ago

This issue is to track

ramya0820 commented 5 years ago

About receive,

About send,

cc @AlexGhiondea @ramya-rao-a

AlexGhiondea commented 5 years ago

What is the impact of setting the maxConcurrentCalls to 1000?

What does that impact?

What is the current default?

ramya0820 commented 5 years ago

What is the impact of setting the maxConcurrentCalls to 1000?

On the receive operations, this would result in faster throughput.

What does that impact?

If included as the default for Track 2, using the SDK and APIs out of the box for new users would be faster (in case setting this value was missed) Since this is configurable, it shouldn't impact any other feature. Stress and performance tests suite may surface issues - CPU usage may remain or go high and would need further investigation.

What is the current default?

The current default is set to 1.

AlexGhiondea commented 5 years ago

Do we know why a default of 1 was selected?

What happens when we have the maxConcurrentCalls set to 1000 -- what are the side-effects of that?

ramya-rao-a commented 5 years ago

Using any value greater than 1 creates unpredictable behavior for the user in case they were not expecting the second message to arrive wheel the first one is still being processed. We will be re-visiting this design in the coming months based on what we learnt in Event Hubs.

What I am concerned at the moment is the supposed difference in perf in ReceiveAndDelete mode when compared to PeekLock mode which has been reported.

ramya0820 commented 5 years ago

About using maxConcurrentCalls value of 100 and comparing the peekLock and receiveAndDelete modes, following are the results on it -

Sample used and tweaked to switch between modes is as below

import { ServiceBusMessage } from "@azure/service-bus";

const { ServiceBusClient, ReceiveMode } = require("@azure/service-bus");

const connectionString =
  "";

const topic = "performance-test-topic";
const subscription = "performance-test-1";

const sleep = (waitTimeInMs: number) => new Promise((resolve) => setTimeout(resolve, waitTimeInMs));

async function main() {
  let messageCounter = 0;

  const ns = ServiceBusClient.createFromConnectionString(connectionString);

  const subscriptionClient = ns.createSubscriptionClient(topic, subscription);

  const receiver = subscriptionClient.createReceiver(ReceiveMode.receiveAndDelete);

  const onMessageHandler = async (brokeredMessage: ServiceBusMessage) => {
    messageCounter++;
    console.log(`Received message: ${messageCounter}`);
    if (messageCounter === 1000) {
      console.timeEnd("receive");
    }
    // await brokeredMessage.complete();
  };
  const onErrorHandler = (err: Error) => {
    console.log("Error occurred: ", err);
  };

  try {
    console.time("receive");
    receiver.registerMessageHandler(onMessageHandler, onErrorHandler, {
      maxConcurrentCalls: 100
    });

    await sleep(500000); // arbitrary delay

    await receiver.close();
    await subscriptionClient.close();
  } finally {
    await ns.close();
  }
}

main().catch((err) => {
  console.log("Error occurred: ", err);
});
ramya-rao-a commented 5 years ago

The difference between auto complete enabled and disabled in peek lock mode is understandable. We add credit for the next message right after user code is done executing.

When auto-complete is disabled, the user code pays the tax for the time taken to complete the message. When auto-complete is enabled, we the library complete the message after the credit has been added.

We do manual credit management because we wanted to control the number of messages in flight at any given point in time instead of flooding the user with messages.

ramya0820 commented 5 years ago

Agree, although it's unclear why receiveAndDelete was found to be slower, as this seems to be fastest among the 3 scenarios.

ramya0820 commented 5 years ago

Following are results of experiments carried out in local and on VMs to receive 1000 messages:

The results do not reflect any 10x difference. The originally perceived difference by customer seems to be based on default value of maxConcurrentCalls and related defaults in rhea. We will keep the current defaults and investigate this further based on actionable items coming out of Event Hubs and Service Bus Track 2 work.

Closing as the performance related improvements and specific actionable items are being tracked in Performance Tests epic and individual customer issues.