diego3g / umbriel

✉️ Umbriel is a simple mailing platform built with Node.js, TypeScript & Prisma.
MIT License
512 stars 73 forks source link

Performance issues #118

Open diego3g opened 2 years ago

diego3g commented 2 years ago

I sent an email for 18k contacts and noticed some CPU and memory problems.

Summary

Errors

When receiving some AWS events, the HTTP service was logging some errors with this trace:

PrismaClientKnownRequestError2 [PrismaClientKnownRequestError]: Unique constraint failed on the fields: (`message_id`,`contact_id`)
    at cb (/home/node/app/node_modules/@prisma/client/runtime/index.js:34800:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async PrismaRecipientsRepository.saveWithEvents (/home/node/app/dist/modules/broadcasting/repositories/prisma/PrismaRecipientsRepository.js:41:5)
    at async RegisterEvent.execute (/home/node/app/dist/modules/broadcasting/useCases/RegisterEvent/RegisterEvent.js:60:5)
    at async RegisterEventController.handle (/home/node/app/dist/modules/broadcasting/useCases/RegisterEvent/RegisterEventController.js:90:22)
    at async /home/node/app/dist/infra/http/adapters/ExpressRouteAdapter.js:15:26 {
  code: 'P2002',
  clientVersion: '2.28.0',
  meta: { target: [ 'message_id', 'contact_id' ] }
} 

Remediations

Infrastructure

Application

diego3g commented 2 years ago

Did some benchmarking around queue job inserting using BullMQ and got the following results:

Why inserting in batch is more expensive than inserting one by one? I don't know, but it is.

My guesses about the memory usage being around the creation of a big recipients array in the JavaScript side were wrong, the problem was inserting to queue.

diego3g commented 2 years ago

Did some benchmarks with Bee Queue, but it didn't result in a better performance, actually it changed nothing.

diego3g commented 2 years ago

Removed two queries when receiving notifications from Amazon SNS.

Now we don't need to check for recipient existence nor create the events separatelly from upserting the recipient.

diego3g commented 2 years ago

Split the SNS webhook HTTP service on c81033171578d5859383b897fed4b41007bbff76.

pellizzetti commented 2 years ago

@diego3g

Inserting 20k jobs individually on the queue leads to 13mb RAM usage around 430ms (This uses high CPU, but maybe it's better to have higher CPU than memory usage);

Touchy subject, memory is usually considerably cheaper than vCPU[0]. Some numbers around the CPU usage in both tests would be nice!

It's also easier to manage/scale/allocate memory than CPU. Besides all that, applications in Node.js tend to consume more CPU than memory. Since we [currently] work almost exclusively with this technology, depending on the usage of each resource, it may be worth to sacrifice some RAM. (:

[0] Item Price
vCPU $0.033174 / vCPU hour
Memory $0.004446 / GB hour
diego3g commented 2 years ago

Hey @pellizzetti, I tried making some CPU benchmarking but I couldn't manage to get the actual results of CPU consumption. Do you have any tips for me on that?

pellizzetti commented 2 years ago

@diego3g

If you created new, isolated scripts for your benchmarking, you won't need to handle startup/setup time, so, I guess a simple time would do it, e.g time node <script>. Just a quick reminder, it would be nice to run it some times to get the avg CPU usage of each script.

pellizzetti commented 2 years ago

If you want to take a deeper dive, take some time to look over perf/perf_events[0].

[0] https://www.brendangregg.com/perf.html

Deividy commented 2 years ago

thought this would be a fun quest, decided to jump in and I got your answer :)

Why inserting in batch is more expensive than inserting one by one? I don't know, but it is.

tldr; because extraneous loops allocating different objects and a lot of strings (JSON.stringify()) before sending to redis; when inserting only one you don't allocate all these objects, and you can save some resources with fewer interactions.

here is the full story:

this basically loops for all the 20k objects that we just added, checking some flags and building a scripts[] array

then it loops again all the 20k objects and sends one by one to redis

so, there is a bunch of loops in all data:

in the end, you have a bunch of copies of the original array; every .map() allocates a new array, BullMQ maintains the full queue, and ioredis also maintains the full queue with all objects. that's why this addBulk() is memory expensive, and it's CPU expensive because it has a lot of loops in all data and every loop blocks the event loop and CPU.


performance issues are tricky; you might think that using time will get the correct time used by the process or that using the cpuUsage() will get the right answers, but most part of the time is not that simple, you will have other processes in the machine, other tasks running in your node application and so on.

to understand this kind of issue you must dig into the code, read and understand what folks are doing, debugger and console.log are your best friends here, external tools will only point the symptom, rarely will show you the issue. sometimes it's better to actually read the code and fully understand what's going on than benchmarking different modules, they might be using the same strategy.

a base ground rule when working with big data in javascript;

✌️