WallarooLabs / pony-kafka

:horse: Pure Pony Kafka client
Other
57 stars 4 forks source link

performance testing/tuning/refactoring #30

Open dipinhora opened 6 years ago

dipinhora commented 6 years ago

The code is mostly unoptimized except for some performance oriented design decisions to keep things as asynchronous as possible.

This code should be properly performance tested and tuned as required.

dipinhora commented 6 years ago

The following is an unscientific and incomplete performance comparison of pony kafka with librdkafka to get an idea of how far we have to go. It is by no way meant to be definitive nor a real benchmark.

TLDR:

Pony Kafka sends data to Kafka about 5% - 10% slower than librdkafka but reads data from Kafka about 75% slower than librdkafka. Pony Kafka also uses more cpu than librdkafka (at least part of this is due to how pony scheduler threads and work stealing function).


All testing was done on an i3.8xlarge in AWS using the wallaroo ochestration framework started with command:

make cluster cluster_name=dh2 mem_required=30 cpus_required=32 num_followers=0 force_instance=i3.8xlarge spot_bid_factor=100 ansible_system_cpus=0,16 no_spot=true cluster_project_name=wallaroo_dev ansible_install_devtools=true

This includes the following:

The following steps were taken after ssh'ing in:

Clone pony-kafka:

cd ~
git clone https://github.com/WallarooLabs/pony-kafka
cd ~/pony-kafka
git checkout code_improvements_new

Build pony-kafka performance app:

cd ~/pony-kafka
ponyc examples/performance

Clone librdkafka:

cd ~
git clone https://github.com/edenhill/librdkafka
cd ~/librdkafka
git checkout v0.11.3

Build librdkafka performance app:

cd ~/librdkafka
./configure
make examples

Install java/kafka:

~/pony-kafka/misc/kafka/download_kafka_java.sh

Everything was run using sudo cset proc -s user -e bash to minimize system cpu contention by using the /user cpu set. Everything was run assigned to dedicate cpu cores to avoid thrashing/context switches. It was also all run with realtime priority (cpu assignments/usage was verified using htop).

Start zookeeper:

numactl -C 15 chrt -f 80 env KAFKA_HEAP_OPTS="-Xmx40960M -Xms40960M" ~/pony-kafka/misc/kafka/start_zookeeper.sh

Start kafka broker 0:

numactl -C 12-14 chrt -f 80 env KAFKA_HEAP_OPTS="-Xmx40960M -Xms40960M" ~/pony-kafka/misc/kafka/start_kafka_0.sh

Start kafka broker 1:

numactl -C 9-11 chrt -f 80 env KAFKA_HEAP_OPTS="-Xmx40960M -Xms40960M" ~/pony-kafka/misc/kafka/start_kafka_1.sh

Start kafka broker 2:

numactl -C 6-8 chrt -f 80 env KAFKA_HEAP_OPTS="-Xmx40960M -Xms40960M" ~/pony-kafka/misc/kafka/start_kafka_2.sh

Create topic:

~/pony-kafka/misc/kafka/create_replicate_topic.sh

Producing tests (acks = -1):

Everything was run using sudo cset proc -s user -e bash to minimize system cpu contention by using the /user cpu set. Everything was run assigned to dedicate cpu cores to avoid thrashing/context switches. It was also all run with realtime priority (cpu assignments/usage was verified using htop).

Each application was run 3 times alternating between one and the other. NOTE: Kafka/zookeeper were not restarted between runs.

Run librdkafka performance app in producer mode with acks=-1:

cd ~/librdkafka
numactl -C 1-5 chrt -f 80 ./examples/rdkafka_performance -P -t test -s 100 -c 1000000 -m "_____________Test2:OneBrokers:500kmsgs:100bytes" -S 1 -a -1 -b 127.0.0.1:9092

Results:

Run 1:

% 416665 backpressures for 1000000 produce calls: 41.666% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 131967ms: 7577 msgs/s and 0.76 MB/s, 416665 produce failures, 0 in queue, no compression

Run 2:

% 407898 backpressures for 1000000 produce calls: 40.790% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 129668ms: 7711 msgs/s and 0.77 MB/s, 407898 produce failures, 0 in queue, no compression

Run 3:

% 409888 backpressures for 1000000 produce calls: 40.989% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 131556ms: 7601 msgs/s and 0.76 MB/s, 409888 produce failures, 0 in queue, no compression

Run pony-kafka performance app in producer mode with acks=-1:

cd ~/pony-kafka
numactl -C 1-5 chrt -f 80 ./performance --client_mode producer --produce_message_size 100 --num_messages 1000000 --brokers 127.0.0.1:9092 --produce_acks -1 --topic test --ponythreads 4 --ponyminthreads 4 --ponypinasio --ponynoblock

Results:

Run 1:

2018-01-29 19:04:37: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 136.187 seconds. Throughput: 7342.85/sec.

Run 2:

2018-01-29 19:09:58: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 137.69 seconds. Throughput: 7262.71/sec.

Run 3:

2018-01-29 19:15:26: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 136.954 seconds. Throughput: 7301.71/sec.


Producing tests (acks = 1):

Everything was run using sudo cset proc -s user -e bash to minimize system cpu contention by using the /user cpu set. Everything was run assigned to dedicate cpu cores to avoid thrashing/context switches. It was also all run with realtime priority (cpu assignments/usage was verified using htop).

Each application was run 3 times alternating between one and the other. NOTE: Kafka/zookeeper were not restarted between runs or the previous test.

Run librdkafka performance app in producer mode with acks=1:

cd ~/librdkafka
numactl -C 1-5 chrt -f 80 ./examples/rdkafka_performance -P -t test -s 100 -c 1000000 -m "_____________Test2:OneBrokers:500kmsgs:100bytes" -S 1 -a 1 -b 127.0.0.1:9092

Results:

Run 1:

% 365065 backpressures for 1000000 produce calls: 36.506% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 55392ms: 18052 msgs/s and 1.81 MB/s, 365065 produce failures, 0 in queue, no compression

Run 2:

% 348412 backpressures for 1000000 produce calls: 34.841% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 54089ms: 18487 msgs/s and 1.85 MB/s, 348412 produce failures, 0 in queue, no compression

Run 3:

% 338523 backpressures for 1000000 produce calls: 33.852% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 50750ms: 19704 msgs/s and 1.97 MB/s, 338523 produce failures, 0 in queue, no compression

Run pony-kafka performance app in producer mode with acks=1:

cd ~/pony-kafka
numactl -C 1-5 chrt -f 80 ./performance --client_mode producer --produce_message_size 100 --num_messages 1000000 --brokers 127.0.0.1:9092 --produce_acks 1 --topic test --ponythreads 4 --ponyminthreads 4 --ponypinasio --ponynoblock

Results:

Run 1:

2018-01-29 19:19:37: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 59.0292 seconds. Throughput: 16940.8/sec.

Run 2:

2018-01-29 19:22:19: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 56.5403 seconds. Throughput: 17686.5/sec.

Run 3:

2018-01-29 19:24:24: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 57.6605 seconds. Throughput: 17342.9/sec.


Producing tests (acks = 0):

Everything was run using sudo cset proc -s user -e bash to minimize system cpu contention by using the /user cpu set. Everything was run assigned to dedicate cpu cores to avoid thrashing/context switches. It was also all run with realtime priority (cpu assignments/usage was verified using htop).

Each application was run 3 times alternating between one and the other. NOTE: Kafka/zookeeper were not restarted between runs or the previous test.

Run librdkafka performance app in producer mode with acks=0:

cd ~/librdkafka
numactl -C 1-5 chrt -f 80 ./examples/rdkafka_performance -P -t test -s 100 -c 1000000 -m "_____________Test2:OneBrokers:500kmsgs:100bytes" -S 1 -a 0 -b 127.0.0.1:9092

Results:

Run 1:

% 177 backpressures for 1000000 produce calls: 0.018% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 33178ms: 30139 msgs/s and 3.01 MB/s, 177 produce failures, 0 in queue, no compression

Run 2:

% 244 backpressures for 1000000 produce calls: 0.024% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 31846ms: 31400 msgs/s and 3.14 MB/s, 244 produce failures, 0 in queue, no compression

Run 3:

% 2 backpressures for 1000000 produce calls: 0.000% backpressure rate % 1000000 messages produced (100000000 bytes), 1000000 delivered (offset 0, 0 failed) in 31666ms: 31579 msgs/s and 3.16 MB/s, 2 produce failures, 0 in queue, no compression

Run pony-kafka performance app in producer mode with acks=0:

cd ~/pony-kafka
numactl -C 1-5 chrt -f 80 ./performance --client_mode producer --produce_message_size 100 --num_messages 1000000 --brokers 127.0.0.1:9092 --produce_acks 0 --topic test --ponythreads 4 --ponyminthreads 4 --ponypinasio --ponynoblock

Results:

Run 1:

2018-01-29 19:26:28: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 34.5447 seconds. Throughput: 28948/sec.

Run 2:

2018-01-29 19:27:50: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 31.7206 seconds. Throughput: 31525.3/sec.

Run 2:

2018-01-29 19:29:37: Received acks for all 1000000 messages produced. num_errors: 0. Time taken: 35.2084 seconds. Throughput: 28402.3/sec.


Consuming tests:

Everything was run using sudo cset proc -s user -e bash to minimize system cpu contention by using the /user cpu set. Everything was run assigned to dedicate cpu cores to avoid thrashing/context switches. It was also all run with realtime priority (cpu assignments/usage was verified using htop).

Each application was run 3 times alternating between one and the other. Prior to running this, data was loaded into kafka using numactl -C 1-5 chrt -f 80 ./examples/rdkafka_performance -P -t test -s 100 -c 10000000 -m "_____________Test2:OneBrokers:500kmsgs:100bytes" -S 1 -a 0 -b 127.0.0.1:9092 NOTE: Kafka/zookeeper were not restarted between runs or the previous test.

Run librdkafka performance app in consumer mode:

cd ~/librdkafka
numactl -C 1-5 chrt -f 80 ./examples/rdkafka_performance -C -t test -b 127.0.0.1:9092 -o beginning -c 10000000 -G test1 # use a unique number each time (1,2,3)

Results:

Run 1:

% 10000000 messages (1000000000 bytes) consumed in 8050ms: 1242185 msgs/s (124.22 MB/s)

Run 2:

% 10000000 messages (1000000000 bytes) consumed in 8452ms: 1183124 msgs/s (118.31 MB/s)

Run 3:

% 10000000 messages (1000000000 bytes) consumed in 8261ms: 1210407 msgs/s (121.04 MB/s)

Run pony-kafka performance app in consumer mode:

cd ~/pony-kafka
numactl -C 1-5 chrt -f 80 ./performance --client_mode consumer --num_messages 10000000 --brokers 127.0.0.1:9092 --topic test --ponythreads 4 --ponyminthreads 4 --ponypinasio --ponynoblock

Results:

Run 1:

2018-01-29 19:45:46: Received 10000000 messages as requested. Time taken: 29.58 seconds. Throughput: 338066/sec.

Run 2:

2018-01-29 19:47:19: Received 10000000 messages as requested. Time taken: 29.944 seconds. Throughput: 333957/sec.

Run 3:

2018-01-29 19:47:56: Received 10000000 messages as requested. Time taken: 29.6287 seconds. Throughput: 337511/sec.

edenhill commented 6 years ago

Good work on the client and the blog post - a lot of good insight into early stage client development. :+1:

You might want to try rdkafka_performance with -X linger.ms=100 in producer mode. The default change of linger.ms from 1000ms to 0ms in 0.11.0 gives poor producer performance for the sake of improved latency. A proper fix is scheduled for 0.11.4.

dipinhora commented 6 years ago

@edenhill Thank you very much for the kind words. As mentioned in the blog post, librdkafka has been a great source of inspiration for us and we wouldn't be this far along without it.

I'll definitely do another round of testing with the -X linger.ms=100 (and a similar change for pony kafka). I'll keep an eye out for how you resolve the two competing concerns of latency and throughput. Defaults are hard. 8*/