Azure / Azurite

A lightweight server clone of Azure Storage that simulates most of the commands supported by it with minimal dependencies
MIT License
1.8k stars 320 forks source link

Azurite is a lot slower than AWS Emulator MinIO #596

Open aksingh004 opened 3 years ago

aksingh004 commented 3 years ago

I am using C++ API of Azurite to test performance and notice that loading a database table or reading a table from Azurite is 4 to 5 times slower than AWS S3 emulator MinIO.

I tried upload_block and upload_text for uploading data and download_to_stream for downloading.

I wonder if Azurite is overall slower than real Azure or am I missing some configuration that can improve performance?

Azurite version - 3.8.0

XiaoningLiu commented 3 years ago

Hi @aksingh004, thanks for the feedback. Can you tell us more about your user scenarios and why need, this helps us better understand customer needs.

By default, Azurite targets to simulate Azure Storage features instead of performance targed. One of the bottleneck of Azurite is from Node.js which by default is a single process and all Azurite Node.js logic are within one main thread. You can see the CPU usage of Azurite is almost limited to one CPU core.

For performace, Azurite introduce external SQL metadata storage support. In this mode, we can leverage Node.js cluster frameworks like pm2 to start cluster of Azurite instances. Requests will be distributed to different instance with load balancer. It can fully use all CPU resources on the machine. I can see more than 10Gbps spped depending on the hardware.

aksingh004 commented 3 years ago

Hi @aksingh004, thanks for the feedback. Can you tell us more about your user scenarios and why need, this helps us better understand customer needs.

By default, Azurite targets to simulate Azure Storage features instead of performance targed. One of the bottleneck of Azurite is from Node.js which by default is a single process and all Azurite Node.js logic are within one main thread. You can see the CPU usage of Azurite is almost limited to one CPU core.

For performace, Azurite introduce external SQL metadata storage support. In this mode, we can leverage Node.js cluster frameworks like pm2 to start cluster of Azurite instances. Requests will be distributed to different instance with load balancer. It can fully use all CPU resources on the machine. I can see more than 10Gbps spped depending on the hardware.

Thank you for your response. This is a good information and as you said that azurite is mainly for functionality testing, that answers the slow performance. There are some tests that we run against aws S3 (MinIO) and Azure (Azurite), they fail with timeout in case of azurite.

I am trying to implement your suggestion of running it with pm2 but I see problem in starting it in cluster mode, I read about it and found that only nodejs app can run in cluster mode. And if I provide a binary file to pm2 (pm2 start azurite -i 15), it starts in fork mode and only 1 instance gets started successfully and remaining fail with error that port is already in use.

Please suggest if I am missing something in starting multiple instance of Azurite.

XiaoningLiu commented 3 years ago

pm2 requires "cluster" mode instead of "fork" mode. Check https://stackoverflow.com/questions/49691848/express-server-port-configuration-issue-with-pm2-cluster-mode.

Azurite also requires an additional metadata store in order to share cross different instances. Set AZURITE_DB environment variable. See https://github.com/Azure/Azurite#customized-metadata-storage-by-external-database-preview It's still under preview, any feedback is welcome!

aksingh004 commented 3 years ago

I am now able to start Azurite in cluster mode with SQL metadata support using mysql database. However, it appears to be running into race condition? I see that in my test, if any blob is to be deleted, it runs into error with message "Internal Server Error" on Azurite but there is no such error if I run Azurite without sql metadata and cluster mode support.

Here is the nodejs (azurite.js) script I created, posting it here so that other users can also make use of it.

var port = process.argv.slice(2); const execFile = require('child_process').execFile; const child = execFile('./azurite', ["--location", "./azurite_data", "--blobHost", "0.0.0.0", "--blobPort", port, "-d", "./azurite_data/debug.log"], (err, stdout, stderr) => { if (err) { throw err; } console.log(stdout); });

And , pm2 start azurite.js -i 10 -- // this spawns 10 instances of Azurite in cluster mode

To add SQL metadata support, I set below variable and pass connection string to mysql database server. export AZURITE_DB=mysql://root:@127.0.0.1:3306/azurite_blob

aksingh004 commented 3 years ago

I looked further and found that azurite had not started in true cluster mode, even though "pm2 list" says online for all instances, but it is basically wrapper nodejs script (I pasted in my previous comment) which is running in cluster. This nodejs script calls a child process to start azurite and only one instance gets succeded in starting and other fails with port in use error as following

Error: Command failed: ./azurite --location ./azurite_data --blobHost 0.0.0.0 --blobPort 4523 -d ./azurite_data/debug.log Exit due to unhandled error: listen EADDRINUSE: address already in use 0.0.0.0:4523

Could you please provide any sample script that starts azurite in cluster mode, that would be very helpful. $ ./azurite --version 3.8.0

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.