geerlingguy / turing-pi-cluster

DEPRECATED - Turing Pi cluster configuration for Raspberry Pi Compute Modules
https://www.youtube.com/watch?v=kgVz4-SEhbE
MIT License
355 stars 47 forks source link

Benchmarks on the Turing Pi Cluster #11

Closed geerlingguy closed 4 years ago

geerlingguy commented 4 years ago

See related: Test Performance and Functionality on Raspberry Pi OS 64-bit

With the full configuration (excluding NextCloud), all tests run 4x, discarded first result (unless noted), then averaged the next 3:

Disk Benchmarks

Disk 4K Random Read 4K Random Write
CM3+ 8 GB eMMC 8.99 MB/sec 9.20 MB/sec
CM3+ microSD - Samsung Evo+ 32GB 6.97 MB/sec 2.90 MB/sec
CM3+ Kingston USB 2.0 SSD 8.87 MB/sec 10.04 MB/sec
Pi 4 microSD - Samsung Evo+ 32GB 11.81 MB/sec 3.25 MB/sec
Pi 4 Kingston USB 3.0 SSD w/o UASP 14.41 MB/sec 23.28 MB/sec
Pi 4 Kingston USB 3.0 SSD w/ UASP 20.59 MB/sec 28.54 MB/sec

(Another note: The onboard eMMC also does large file writes at ~5-10 MB/sec whereas the microSD cards can do 20-40 MB/sec in some cases... but the eMMC is a way better option for general purpose computing since it's more durable and 3x faster than the fastest microSD cards for random IO (and 10-100x faster than the majority of microSD cards I've tested).

Network Benchmarks

Configuration Speed
Pi 4 2GB in Dramble 936 Mbps
CM3+ in Turing Pi 95 Mbps

Note that the Turing Pi cluster does support the full 95 Mbps on each Pi simultaneously. So you can saturate a 1 Gbps connection to the Turing Pi cluster as a whole.

Full System Benchmarks

7-node Turing Pi Cluster

32-bit HypriotOS

Test Result
Drupal, authenticated (ab) 6.65 req/s
Drupal, anonymous (wrk) 28.29 req/s
Wordpress, authenticated (ab) 21.35 req/s
Wordpress, anonymous (wrk) 25.53 req/s
Minecraft, world initialization 983 seconds1 (~16.4 min)

4-node Pi Dramble Cluster (Running K3s, same configs)

32-bit HypriotOS

Test Result
Drupal, authenticated (ab) 14.12 req/s
Drupal, anonymous (wrk) 85.85 req/s
Wordpress, authenticated (ab) 31.36 req/s
Wordpress, anonymous (wrk) 40.36 req/s
Minecraft, world initialization 407 seconds1 (~6.8 min)

32-bit Raspberry Pi OS

Test Result
Drupal, authenticated (ab) 12.71 req/s
Drupal, anonymous (wrk) 95.87 req/s
Wordpress, authenticated (ab) 39.51 req/s
Wordpress, anonymous (wrk) 49.74 req/s
Minecraft, world initialization 300 seconds (~5 min)

64-bit Raspberry Pi OS

Test Result
Drupal, authenticated (ab) 10.84 req/s
Drupal, anonymous (wrk) 73.67 req/s
Wordpress, authenticated (ab) 36.64 req/s
Wordpress, anonymous (wrk) 42.69 req/s
Minecraft, world initialization 400 seconds1 (~6.7 min)

1 This test needs to be re-run after 15 minute cool-down and run three times in progression. The first time I ran the tests I accidentally only took the first run, and didn't take the average of the final three runs. Oops.

geerlingguy commented 4 years ago

Caveats: The Turing Pi cluster is running 7 Compute Modules, CM3+ boards with 1 GB of RAM each. The Pi Dramble cluster is running 4 Pi 4 2 GB models. The two clusters used the same exact K3s setup and Turing Pi Cluster configurations.

geerlingguy commented 4 years ago

I did a 15 minute burn-in (running wrk on both the Drupal and Wordpress sites simultaneously) and it seems like the Pis were able to keep their cool—just barely.

Screen Shot 2020-05-26 at 12 36 22 PM

Guess which one is worker-03?

IMG_0004

geerlingguy commented 4 years ago

Running the same burn-in test on the Dramble cluster (with the same configuration) now. I'll try to remember to grab an IR image as well, 10 minutes in or so.

One interesting takeaway: Wordpress is surprisingly CPU intense in its default config for non-authed users, whereas Drupal is way lighter and can serve up 2x the requests using its default caching mechanisms. They're equal when you throw a caching proxy in front or dump to HTML.

Screen Shot 2020-05-27 at 10 06 43 PM

IMG_0002

geerlingguy commented 4 years ago

So... I just re-tested the Dramble cluster running Raspberry Pi OS's 64-bit beta version, and while Wordpress and Minecraft were faster, Drupal was slower. I'm trying to see if maybe something in Drupal core got way slower in the most recent point release (doubtful), or if maybe Hypriot vs Raspberry Pi OS 32-bit might have some performance issue that Drupal runs into.

To be clear: I'm running the exact same hardware (4 2 GB Pi 4s), with the exact same microSD cards (physically—I am re-flashing the same cards with different OSes so it can't be a card-to-card variance), and running all tests at least 3 times allowing a 30 minute warm-up period before starting.

(Benchmarking is hard, but turns up interesting results sometimes.)

geerlingguy commented 4 years ago

I'm also seeing a fair bit of variance in how long it takes Minecraft to generate a new world... so I'm going to have to re-test that a bit.

I'm guessing the reason Drupal/Wordpress are slower is because on a memory-constrained 64-bit OS, you can't squash as many workers into the same amount of memory, so you might be able to have 3 or 4 workers on a 512 MB instance running Drupal on 32-bit, whereas you can only fit 2 or 3 on 64-bit due to pointers being larger, and these PHP applications storing a lot of tiny bits of state in memory.

So 64-bit benefits purely-CPU-driven processing (e.g. encodes and such—see https://github.com/geerlingguy/drupal-pi/issues/45), and processes where you have massive amounts of RAM available (e.g. 4 GB or more), but actually presents a bit of a limit with highly-memory-constrained environments (e.g. < 2 GB of RAM).

geerlingguy commented 4 years ago

Just noting that since I'm sure someone may ask—yes, you can max out the bandwidth of all the Turing Pi nodes at the same time, and the onboard Gigabit switch will handle the traffic. As an example, I set up a connection to two pis and ran iperf to both of them from my Mac (which gets ~930 Mbps to a single Pi 4) at the same time:

$ iperf -c 10.0.100.163 & iperf -c 10.0.100.74
[1] 19606
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.100.74, TCP port 5001
Client connecting to 10.0.100.163, TCP port 5001
TCP window size:  129 KByte (default)
------------------------------------------------------------
TCP window size:  129 KByte (default)
------------------------------------------------------------
[  4] local 10.0.100.118 port 54978 connected with 10.0.100.74 port 5001
[  4] local 10.0.100.118 port 54977 connected with 10.0.100.163 port 5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   113 MBytes  94.7 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   112 MBytes  93.9 Mbits/sec
[1]  + done       iperf -c 10.0.100.163

I could confirm that the Mac's network stats showed double the throughput during the test (~24 MB/sec when doing both, vs ~12 MB/sec when doing just one).

geerlingguy commented 4 years ago

Leaving open just so I can re-run the Minecraft world generation benchmark again in the 3 scenarios where I didn't record the correct results.

geerlingguy commented 4 years ago

Brad Manske posted this comment on my YouTube channel:

Screen Shot 2020-06-22 at 2 25 31 PM

I didn't even notice—the two Inateck external cases I had were both the older 'non-UASP' type (link on Amazon — note the two options, with/without UASP). So I ordered a UASP type case and will be updating the benchmarks above (and in the next video).

Edit: Holy cow, big improvement!

Disk hdparm dd 4K Random Read 4K Random Write
Pi 4 Kingston USB 3.0 SSD w/o UASP 172.13 MB/sec 102.67 MB/sec 14.41 MB/sec 23.28 MB/sec
Pi 4 Kingston USB 3.0 SSD w/ UASP 296.71 MB/sec 149.00 MB/sec 20.59 MB/sec 28.54 MB/sec
% difference 53% faster 37% faster 35% faster 20% faster

To confirm that the drive is using UASP, check with lsusb -t:

$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M

It shows Driver=uas for UASP-enabled drives (USB Attached SCSI Protocol), and usb-storage for non-UASP (Bulk-Only Transport/BOT).

Also, regarding power consumption:

Note: Power measurements taken with Pi 4 headless with Pi-FAN plugged in via GPIO (uses ~0.20A), Ethernet, and the USB 3.0 drive. Used a Satechi USB-C power tester.

Good article on UASP from 2015: What Is A UASP Storage Enclosure?.

geerlingguy commented 4 years ago

Does that same improvement seen on the USB 3.0-native Pi 4 translate at all to USB 2.0 ports on the CM3+?

Disk hdparm dd 4K Random Read 4K Random Write
CM3+ Kingston USB 3.0 SSD w/o UASP 32.00 MB/sec 30.40 MB/sec 8.87 MB/sec 10.04 MB/sec
CM3+ Kingston USB 3.0 SSD w/ UASP 31.79 MB/sec 31.70 MB/sec 7.48 MB/sec 8.55 MB/sec

It was mounted with usb-storage driver when I tested with the CM PoE Board:

Port 4: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M

Some unofficial confirmation that the BCM2835 doesn't support UASP because it lacks 'scatter gather' which is a requirement of the Linux UASP driver.

geerlingguy commented 4 years ago

Benchmarking episode is live (Episode 5 - Benchmarking the Turing Pi), and I referenced it in the README in this commit: https://github.com/geerlingguy/turing-pi-cluster/commit/6191dd74cd1148f33434757c0120396199fafea9

I will likely be doing a little more disk IO testing later as I have some new hardware coming in, but that can wait for a new issue... this one's already a bit overloaded!