dropbox / amqp-coffee

An AMQP 0.9.1 client for Node.js.
MIT License
79 stars 30 forks source link

high latency on VM machines #56

Closed AVVS closed 8 years ago

AVVS commented 8 years ago

This isn't issue of this library, but I was wondering if you might have had similar issues and solved them already. I've already spent 6 hours researching and tuning anything that is tunable, but with no luck at all.

Issue is described here: https://github.com/rabbitmq/rabbitmq-server/issues/564

TL;DR: publisher - broker - consumer chain for 1 message takes 20ms on a VM and 0.5ms on my laptop

barshow commented 8 years ago

Is it a persistent queue?

AVVS commented 8 years ago

tried both durable queue and not, also explicitly putting deliverMode to 1 so that messages are not persistent (thought could be a disk issue initially)

https://github.com/makeomatic/ms-amqp-transport/blob/master/bench/roundtrip.js - simple bench I used and discussion at the forum, though more or less "look at tcp dump" suggestions: https://groups.google.com/forum/#!topic/rabbitmq-users/5EkwBt2UdIY

barshow commented 8 years ago

Other dumb questions, same version of node, same version of rabbit. is rabbit running within the vm, or traversing some sort of vm nat. you could try leaving rabbit out of the vm, and run node within. and then reverse it to narrow in.

i would also look at memory and make sure the vm isn't swaping.

pcap would also be interesting. its also worth considering your local machine may be 40 times faster.

AVVS commented 8 years ago

Other dumb questions, same version of node, same version of rabbit.

yes, 5.4.1 on both, tried 3.5.7 and 3.6.0.

is rabbit running within the vm, or traversing some sort of vm nat. you could try leaving rabbit out of the vm, and run node within. and then reverse it to narrow in.

tried following setups:

local machine vm
node, rabbit
node rabbit
node, rabbit

Types of VMs I've tried:

  1. google cloud high CPU 2 core / 1.8 ram
  2. KVM at colo 96gb RAM, 16 cores
  3. virtualbox / docker
  4. virtualbox / alpine-linux

Local machines:

  1. imac 5k
  2. macbook pro 15 inch baseline dated a few years back
  3. ubuntu over same hardware

Swap is an interesting idea, but vm.swappiness or whatever that was in the kernel was set to 1 and with so much RAM on some of scenarios that shouldnt be happening

Result every time is the same: vm is ~40ms, local machine is ~1ms

barshow commented 8 years ago

try vmstat 1 5 on the vm to monitor things

AVVS commented 8 years ago

Here is gzipped pcapng

rabbitmq.dump.pcapng.gz

Typical round-trip sequence takes 38 ms: from packet 91 to 98

screen shot 2016-01-21 at 2 40 14 am
AVVS commented 8 years ago

and mostly all of the time is lost between packets 93 and 94

barshow commented 8 years ago

what if you try not auto generated queues just as a test. I know that shouldn't effect anything

AVVS commented 8 years ago

tried, exactly the same. I've posted more dumps at the rabbitmq-users discussion group, configuration, vagrantfile for vm and dumps for both barebone and vm on that same server, as well as net kernel settings.

would love to hear more ideas :+1: and specifically whether you can recreate this too (my friends tried and they had exactly the same results as I)

AVVS commented 8 years ago
vagrant@vagrant:~$ vmstat 1 20
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 1762364  26868 117048    0    0    18     2   24   69  1  0 99  0  0
 0  0      0 1762356  26868 117048    0    0     0     0   58  150  1  1 99  0  0
 1  0      0 1680256  26868 117048    0    0     0     0  713 1256 23  3 74  0  0
 0  0      0 1683620  26868 117048    0    0     0    32 1266 1864 43  3 54  0  0
 0  0      0 1681516  26868 117048    0    0     0     0  186  591  4  2 94  0  0
 0  0      0 1680564  26876 117048    0    0     0    32  219  775  4  1 96  0  0
 0  0      0 1678876  26876 117048    0    0     0    36  263  784  4  1 96  0  0
 0  0      0 1677588  26876 117048    0    0     0     0  223  788  3  0 97  0  0
 0  0      0 1676640  26876 117048    0    0     0     0  264  872  3  1 96  0  0
 0  0      0 1762052  26876 117048    0    0     0     0  204  711  3  1 96  0  0
 0  0      0 1762152  26876 117048    0    0     0     0   29   56  0  0 100  0  0
 0  0      0 1762236  26876 117048    0    0     0     0   44  119  0  0 100  0  0
 0  0      0 1762308  26876 117048    0    0     0     0   31   58  1  0 100  0  0
 0  0      0 1762308  26876 117048    0    0     0     0   37   99  0  0 100  0  0
vagrant@vagrant:~/ms-amqp-transport$ npm run bench

> ms-amqp-transport@ bench /home/vagrant/ms-amqp-transport
> npm run compile && node ./bench/roundtrip.js

> ms-amqp-transport@ compile /home/vagrant/ms-amqp-transport
> babel -d ./lib ./src

src/amqp.js -> lib/amqp.js
src/index.js -> lib/index.js
src/serialization.js -> lib/serialization.js
Messages sent: 104
Mean is 51.97039682539682ms ~15.161607608986849%
Total time is 6.095s 0.051970396825396815s
barshow commented 8 years ago

Did you ever figure this out?

AVVS commented 8 years ago

nope, tested with wire shark and tcp acks are spiking from time to time up to 30 ms, couldn't invest more time in it to completely beat it, so the issue still persists