Open geerlingguy opened 2 weeks ago
Getting full 25 Gbps Ethernet on the 2nd interface:
ubuntu@ubuntu:~$ ethtool eno2np1
Settings for eno2np1:
Supported ports: [ FIBRE ]
Supported link modes: 25000baseCR/Full
1000baseX/Full
10000baseCR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: RS BASER
Advertised link modes: 25000baseCR/Full
1000baseX/Full
10000baseCR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 25000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: on
Port: Direct Attach Copper
PHYAD: 1
Transceiver: internal
netlink error: Operation not permitted
Current message level: 0x00002081 (8321)
drv tx_err hw
Link detected: yes
If I try running Geekbench 6 I get a core dump, lol:
ubuntu@ubuntu:~/Geekbench-6.3.0-LinuxARMPreview$ ./geekbench6
<jemalloc>: Unsupported system page size
<jemalloc>: Unsupported system page size
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
I opened up a support issue for that: Can't run Geekbench 6 Arm Preview on AmpereOne 192-core system
And yes, I know this system is not really an SBC. I still want to test it against Arm SBCs, though ;)
To get btop
to show the CPU SoC temps instead of apm_xgene/IO Power
, I went into options o
, tabbed to the CPU tab, and under 'Cpu sensor' changed it to apm_xgene/SoC Temperature
.
Jeff, if time permits could you please check this:
grep CONFIG_ARM64_MTE /boot/config-6.8.0*
Background: the CPU cores should be capable of MTE but your machine doesn't expose the feature via /proc/cpuinfo
.
No GPU in it but can you check it with some AMD/NVIDIA graphic cards?
@hrw - I'd love to find a way to get a test sample of one of AMD or Nvidia's enterprise server cards—right now the best fit I have is an older Quadro RTX card, but it won't fit in this chassis.
@ThomasKaiser I'll try to run that next time I have the server booted (remind me if I forget next week); I shut it down over the weekend and a boot cycle takes 5-10 minutes, so I'm too lazy to sit and wait today for one command!
@geerlingguy "add pcie x16 riser cable to your shopping list" was my first idea but then I realized that server case would lack power cables for gpu as well.
@hrw - The server actually includes 2x8 pin PCIe power connections, it's designed for up to 1 fanless GPU (needs high CFM to keep cool).
It looks like one stick of RAM was spewing errors, see https://github.com/geerlingguy/top500-benchmark/issues/43#issuecomment-2441998089
I've re-seated that RAM module (DIMMF1
), and am going to re-run all benchmarks so far. It is not erroring out now.
@ThomasKaiser:
ubuntu@ubuntu:$ grep CONFIG_ARM64_MTE /boot/config-6.8.0*
/boot/config-6.8.0-39-generic-64k:CONFIG_ARM64_MTE=y
/boot/config-6.8.0-47-generic:CONFIG_ARM64_MTE=y
Attempting qemu-coremark, during setup I'm getting an error: meson setup fails with 'Dependency "glib-2.0" not found'
Had to install libglib2.0-dev
manually, then add myself to the kvm
group, but now the benchmark runs.
I noticed when I run sudo shutdown now
, I get logged out of ubuntu
and SSH goes away, but then the server won't actually power off (and go into BMC-only mode) for many minutes.
Watching the SOL Console today, I saw tons of errors like:
[ 5261.993963] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ee0} len:0
[ 5270.120534] {1788}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[ 5270.129045] {1788}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 5270.137729] {1788}[Hardware Error]: event severity: corrected
[ 5270.143461] {1788}[Hardware Error]: Error 0, type: corrected
[ 5270.149193] {1788}[Hardware Error]: section_type: memory error
[ 5270.155186] {1788}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400)
[ 5270.164478] {1788}[Hardware Error]: node:0 card:5 module:16 device:7
[ 5270.171078] {1788}[Hardware Error]: error_type: 13, scrub corrected error
[ 5270.178026] EDAC MC0: 1 CE scrub corrected error on unknown memory (node:0 card:5 module:16 device:7 page:0x0 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:5 module:16 device:7 status(0x0000000000000400): Storage error in DRAM memory)
[ 5271.187341] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ee4} len:0
[ 5280.388425] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ef0} len:0
So it looks like that DIMM is throwing a bunch of errors, maybe causing the Ethernet driver to throw other errors?
[ 5372.462135] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 5372.468653] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[ 5381.671651] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 5381.678169] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
...
[ 5417.936638] INFO: task kworker/72:1:1300 blocked for more than 122 seconds.
[ 5417.943594] Tainted: G W 6.8.0-39-generic-64k #39-Ubuntu
[ 5417.950804] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
[ 5603.138033] EDAC MC0: 1 CE single-symbol chipkill ECC on P0_Node0_Channel5_Dimm0 DIMMF1 (node:0 card:5 module:16 rank:0 bank_group:3 bank_address:3 device:7 row:1479 column:1216 DIMM location: P0_Node0_Channel5_Dimm0 DIMMF1 page:0x2e3b7 offset:0x3800 grain:1 syndrome:0x0 - APEI location: node:0 card:5 module:16 rank:0 bank_group:3 bank_address:3 device:7 row:1479 column:1216 DIMM location: P0_Node0_Channel5_Dimm0 DIMMF1 status(0x0000000000000400): Storage error in DRAM memory)
... [finally a long time later] ...
[ 5900.617885] reboot: Power down
It's still always DIMMF1
:)
I saw the shutdown of an AmpereOne machine I was testing take a really long time too due to the Broadcom Ethernet driver. But I didn’t see any of the DRAM or APEI issues, so I’m not sure they’re related.
I saw the shutdown of an AmpereOne machine I was testing take a really long time too due to the Broadcom Ethernet driver.
Hmm, maybe that's it then — those messages kept popping in amidst all the DIMM messages. Might be nice to figure out how to fix the bnxt_en
driver!
Testing a RAID 0 array of all the NVMe drives following my guide:
ubuntu@ubuntu:~$ sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=6 /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1 /dev/nvme3n1p1 /dev/nvme5n1p1 /dev/nvme6n1p1
ubuntu@ubuntu:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Oct 30 16:37:22 2024
Raid Level : raid0
Array Size : 11251445760 (10.48 TiB 11.52 TB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Update Time : Wed Oct 30 16:37:22 2024
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : original
Chunk Size : 512K
Consistency Policy : none
Name : ubuntu:0 (local to host ubuntu)
UUID : 6dd22af6:0fd54fa0:9463f73f:636afb4e
Events : 0
Number Major Minor RaidDevice State
0 259 11 0 active sync /dev/nvme0n1p1
1 259 13 1 active sync /dev/nvme1n1p1
2 259 12 2 active sync /dev/nvme2n1p1
3 259 14 3 active sync /dev/nvme3n1p1
4 259 15 4 active sync /dev/nvme5n1p1
5 259 16 5 active sync /dev/nvme6n1p1
ubuntu@ubuntu:~$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0
ubuntu@ubuntu:~$ sudo mkdir /mnt/raid0
ubuntu@ubuntu:~$ sudo mount /dev/md0 /mnt/raid0
Running my disk benchmark on the array...
Benchmark | Result |
---|---|
iozone 4K random read | 58.05 MB/s |
iozone 4K random write | 250.06 MB/s |
iozone 1M random read | 5444.03 MB/s |
iozone 1M random write | 4411.07 MB/s |
iozone 1M sequential read | 7120.75 MB/s |
iozone 1M sequential write | 4458.30 MB/s |
Ampere sent over a replacement DIMM, and it seems to have corrected all the memory issues.
However, shutdown is still excruciating — timing this shutdown cycle, it took 15+ minutes, and I just see tons of Ethernet NIC errors (see below for a snippet), maybe a bug in the bnxt_en
driver on arm64?
[ 224.516490] infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
[ 224.523180] infiniband bnxt_re0: Couldn't start port
[ 224.528173] bnxt_en 0003:02:00.0 bnxt_re0: Failed to destroy HW QP
[ 224.534384] ------------[ cut here ]------------
[ 224.538988] WARNING: CPU: 97 PID: 2721 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x13c/0x1d8 [ib_core]
[ 224.548759] Modules linked in: tls xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables qrtr overlay nls_iso8859_1 bnxt_re(+) ampere_cspmu cfg80211 dax_hmem acpi_ipmi ib_uverbs cxl_acpi ast cxl_core ipmi_ssif arm_cspmu_module arm_spe_pmu i2c_algo_bit ib_core onboard_usb_hub acpi_tad arm_cmn ipmi_msghandler xgene_hwmon cppc_cpufreq sch_fq_codel binfmt_misc dm_multipath nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 crct10dif_ce polyval_ce polyval_generic ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 sha3_ce sha2_ce nvme sha256_arm64 sha1_ce nvme_core bnxt_en xhci_pci xhci_pci_renesas nvme_auth aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: ipmi_devintf]
[ 224.637726] CPU: 97 PID: 2721 Comm: (udev-worker) Not tainted 6.8.0-39-generic-64k #39-Ubuntu
[ 224.646237] Hardware name: Supermicro Super Server/R13SPD, BIOS T20241001152934 10/01/2024
[ 224.654487] pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
[ 224.661437] pc : ib_free_cq+0x13c/0x1d8 [ib_core]
[ 224.666152] lr : ib_mad_port_open+0x220/0x450 [ib_core]
[ 224.671388] sp : ffff80010920f520
[ 224.674690] x29: ffff80010920f520 x28: 0000000000000000 x27: ffffb4c746059120
[ 224.681813] x26: 0000000000000000 x25: ffff0002527e8870 x24: ffff0002527e88f8
[ 224.688936] x23: ffffb4c7465f3e90 x22: 00000000ffffff92 x21: ffffb4c7465fc550
[ 224.696060] x20: ffff000246000000 x19: ffff00015794bc00 x18: ffff8000e8d400f0
[ 224.703182] x17: 0000000000000000 x16: 0000000000000000 x15: 6c6c6174735f7766
[ 224.710305] x14: 0000000000000000 x13: 505120574820796f x12: 7274736564206f74
[ 224.717429] x11: 2064656c69614620 x10: 0000000000000000 x9 : ffffb4c7465c58b0
[ 224.724552] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 224.731675] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 224.738798] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000002
[ 224.745921] Call trace:
[ 224.748355] ib_free_cq+0x13c/0x1d8 [ib_core]
[ 224.752723] ib_mad_port_open+0x220/0x450 [ib_core]
[ 224.757609] ib_mad_init_device+0x78/0x228 [ib_core]
[ 224.762582] add_client_context+0xfc/0x208 [ib_core]
[ 224.767556] enable_device_and_get+0xe0/0x1e0 [ib_core]
[ 224.772790] ib_register_device.part.0+0x130/0x218 [ib_core]
[ 224.778459] ib_register_device+0x38/0x68 [ib_core]
[ 224.783345] bnxt_re_ib_init+0x120/0x238 [bnxt_re]
[ 224.788135] bnxt_re_probe+0x14c/0x268 [bnxt_re]
[ 224.792746] auxiliary_bus_probe+0x50/0x108
[ 224.796920] really_probe+0x1c0/0x420
[ 224.800575] __driver_probe_device+0x94/0x1d8
[ 224.804920] driver_probe_device+0x48/0x188
[ 224.809091] __driver_attach+0x14c/0x2c8
[ 224.813002] bus_for_each_dev+0x88/0x110
[ 224.816913] driver_attach+0x30/0x60
[ 224.820476] bus_add_driver+0x17c/0x2d0
[ 224.824300] driver_register+0x68/0x178
[ 224.828125] __auxiliary_driver_register+0x78/0x148
[ 224.832990] bnxt_re_mod_init+0x54/0xfff8 [bnxt_re]
[ 224.837861] do_one_initcall+0x64/0x3b8
[ 224.841687] do_init_module+0xa0/0x280
[ 224.845425] load_module+0x7b8/0x8f0
[ 224.848988] init_module_from_file+0x98/0x118
[ 224.853332] idempotent_init_module+0x1a4/0x2c8
[ 224.857850] __arm64_sys_finit_module+0x70/0xf8
[ 224.862368] invoke_syscall.constprop.0+0x84/0x100
[ 224.867147] do_el0_svc+0xe4/0x100
[ 224.870536] el0_svc+0x48/0x1c8
[ 224.873673] el0t_64_sync_handler+0x148/0x158
[ 224.878019] el0t_64_sync+0x1b0/0x1b8
[ 224.881670] ---[ end trace 0000000000000000 ]---
[ 224.886282] bnxt_en 0003:02:00.0 bnxt_re0: Free MW failed: 0xffffff92
[ 224.892720] infiniband bnxt_re0: Couldn't open port 1
[ 257.266860] INFO: task (udev-worker):2732 blocked for more than 122 seconds.
[ 257.273911] Tainted: G W 6.8.0-39-generic-64k #39-Ubuntu
[ 257.281123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 284.760123] systemd-shutdown[1]: Waiting for process: 2721 ((udev-worker)), 2732 ((udev-worker))
[ 326.899586] bnxt_en 0003:02:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (101989 > 100000) msec active 1
[ 326.911840] bnxt_en 0003:02:00.1 bnxt_re1: Failed to modify HW QP
[ 326.917924] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[ 326.924614] infiniband bnxt_re1: Couldn't start port
[ 326.929635] bnxt_en 0003:02:00.1 bnxt_re1: Failed to destroy HW QP
[ 326.935847] bnxt_en 0003:02:00.1 bnxt_re1: Free MW failed: 0xffffff92
[ 326.942289] infiniband bnxt_re1: Couldn't open port 1
[ 327.166856] bnxt_en 0003:02:00.1 bnxt_re1: Failed to deinitialize RCFW: 0xffffff92
[ 327.184299] bnxt_en 0003:02:00.0 bnxt_re0: Failed to remove GID: 0xffffff92
[ 327.192669] bnxt_en 0003:02:00.0 bnxt_re0: Failed to deinitialize RCFW: 0xffffff92
[ 338.125977] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x44 0x69b} len:0
[ 338.134153] bnxt_en 0003:02:00.1 eno2np1: hwrm vnic set tpa failure rc for vnic 2: fffffff0
[ 340.376637] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5eb} len:0
[ 347.935864] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6a4} len:0
[ 350.701222] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5f0} len:0
[ 357.708225] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6ae} len:0
[ 360.760027] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x23 0x5f1} len:0
[ 367.482017] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b4} len:0
[ 370.821975] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5f5} len:0
[ 377.268121] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b7} len:0
[ 381.507812] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5fa} len:0
[ 387.048016] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b9} len:0
[ 391.749581] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5ff} len:0
[ 396.814118] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6c6} len:0
[ 403.018680] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x23 0x606} len:0
[ 406.578705] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6cf} len:0
...
[ 621.657292] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 621.663810] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[ 625.054417] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x37 0x690} len:0
[ 625.911529] INFO: task kworker/34:0:223 blocked for more than 245 seconds.
[ 625.918395] Tainted: G W 6.8.0-39-generic-64k #39-Ubuntu
[ 625.925604] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 631.417059] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 631.423577] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[ 635.129886] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x37 0x693} len:0
...
Could you try blacklisting the bnxt_re
module? To do so, edit /etc/modprobe.d/blacklist.conf
and add:
blacklist bnxt_re
From https://utcc.utoronto.ca/~cks/space/blog/linux/BroadcomNetworkDriverAndRDMA?showcomments :
The driver stalls during boot and spits out kernel messages like:
bnxt_en 0000:ab:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xf]=0x3 waited (102721 > 100000) msec active 1 bnxt_en 0000:ab:00.0 bnxt_re0: Failed to modify HW QP infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110 infiniband bnxt_re0: Couldn't start port bnxt_en 0000:ab:00.0 bnxt_re0: Failed to destroy HW QP [... more fun ensues ...]
This causes systemd-udev-settle.service to fail:
udevadm[1212]: Timed out for waiting the udev queue being empty. systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
This then causes Ubuntu 24.04's ZFS services to fail to completely start, which is a bad thing on hardware that we want to >use for our ZFS fileservers.
We aren't the only people with this problem, so I was able to find various threads on the Internet, for example. These gave me the solution, which >is to blacklist the bnxt_re kernel module, but at the time left me with the mystery of how and why the bnxt_re module >was even being loaded in the first place.
@bexcran - Will try that, after waiting 15 minutes I just pushed an immediate shutdown so I could finally power cycle.
(On the plus side, power on is much faster now, with a DIMM not spewing out errors all the time.)
@geerlingguy That's what I'd do too! If you want to be a bit nicer to your system/filesystem and you have "Magic SysRq Keys" enabled you can do:
ALT+PrintScreen+s,u,b
That is, press and hold ALT and SysRq (will probably be labeled PrintScr on your keyboard instead of SysRq) while pressing 's', then 'u' then 'b' without letting go of ALT and SysRq.
That'll sync data to disk, attempt to unmount filesystems and the reboot.
I added /etc/modprobe.d/blacklist-bnxt.conf
with blacklist bnxt_re
inside, and rebooted.
Now, it reaches poweroff.target
within 3 seconds, and Power down
state after about 12. SOOOOO much nicer lol.
I guess if I ever need Infiniband over Ethernet, I can figure out that bnxt_re
module, otherwise, not sure why it would load by default!
@bexcran - Is there any simple way of switching the kernel I'm booting on here? I would like to try the 4K kernel just to see if Geekbench will complete a run, but the default kernel that it's running right now (for performance reasons) is 64K.
@geerlingguy Sorry, I don't know.
I may do a reinstall of the OS on a separate drive just to do that test then.
Also, now that I have my Ampere Altra 32-core NAS server upgraded to 25 Gbps Ethernet: https://github.com/geerlingguy/arm-nas/issues/16
I can finally run the iperf3
test between these two machines!
ubuntu@ubuntu:~$ iperf3 -c 10.0.2.51
Connecting to host 10.0.2.51, port 5201
[ 5] local 10.0.2.21 port 41304 connected to 10.0.2.51 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.73 GBytes 23.4 Gbits/sec 9 1.37 MBytes
[ 5] 1.00-2.00 sec 2.69 GBytes 23.1 Gbits/sec 423 998 KBytes
[ 5] 2.00-3.00 sec 2.22 GBytes 19.1 Gbits/sec 80 737 KBytes
[ 5] 3.00-4.00 sec 1.82 GBytes 15.6 Gbits/sec 93 928 KBytes
[ 5] 4.00-5.00 sec 2.56 GBytes 22.0 Gbits/sec 211 997 KBytes
[ 5] 5.00-6.00 sec 2.57 GBytes 22.1 Gbits/sec 250 765 KBytes
[ 5] 6.00-7.00 sec 2.56 GBytes 22.0 Gbits/sec 128 952 KBytes
[ 5] 7.00-8.00 sec 2.57 GBytes 22.1 Gbits/sec 198 846 KBytes
[ 5] 8.00-9.00 sec 2.58 GBytes 22.2 Gbits/sec 113 1.07 MBytes
[ 5] 9.00-10.00 sec 2.59 GBytes 22.2 Gbits/sec 141 718 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 24.9 GBytes 21.4 Gbits/sec 1646 sender
[ 5] 0.00-10.04 sec 24.9 GBytes 21.3 Gbits/sec receiver
I've noticed some variances—hard to tell if it's on the NAS side, the AmpereOne side, or my cloud router. None of them are showing 100% CPU utilization, and watching on atop
, I don't see any interrupt issues or any other bottleneck.
Testing a copy over SMB from one of the NVMe on this system to the NVMe on the HL15:
$ sudo apt install cifs-utils
$ sudo mkdir /mnt/mercury
$ sudo mount -t cifs -o user=jgeerling,uid=$(id -u),gid=$(id -g) //nas01.mmoffice.net/mercury /mnt/mercury
# Inside /mnt/nvme/test, create a large file
$ fallocate -l 100G largefile
# Benchmark file copy over SMB *to* NAS01
ubuntu@ubuntu:/mnt/nvme/test$ rsync --info=progress2 -a largefile /mnt/mercury/test/largefile
107,374,182,400 100% 1.14GB/s 0:01:27 (xfr#1, to-chk=0/1)
# Benchmark file copy over SMB *from* NAS01
ubuntu@ubuntu:/mnt/nvme/test$ rsync --info=progress2 -a /mnt/mercury/test/largefile largefile
107,374,182,400 100% 1.03GB/s 0:01:37 (xfr#1, to-chk=0/1)
Not quite as fast as I was hoping, but this is dealing with SMB + Ethernet + rsync overhead, and I saw it going between 8-15 Gbps on the NAS. Interesting that the copy back was noticeably slower (about 1 Gbps slower).
Testing with fio
:
$ fio --name=job-w --rw=write --size=2G --ioengine=libaio --iodepth=4 --bs=128k --direct=1 --filename=bench.file
WRITE: bw=864MiB/s (906MB/s), 864MiB/s-864MiB/s (906MB/s-906MB/s), io=2048MiB (2147MB), run=2370-2370msec
$ fio --name=job-r --rw=read --size=2G --ioengine=libaio --iodepth=4 --bs=128K --direct=1 --filename=bench.file
READ: bw=1267MiB/s (1328MB/s), 1267MiB/s-1267MiB/s (1328MB/s-1328MB/s), io=2048MiB (2147MB), run=1617-1617msec
$ fio --name=job-randw --rw=randwrite --size=2G --ioengine=libaio --iodepth=32 --bs=4k --direct=1 --filename=bench.file
read: IOPS=15.0k, BW=59.4MiB/s (62.3MB/s)(2048MiB/34486msec)
WRITE: bw=59.4MiB/s (62.3MB/s), 59.4MiB/s-59.4MiB/s (62.3MB/s-62.3MB/s), io=2048MiB (2147MB), run=34486-34486msec
$ fio --name=job-randr --rw=randread --size=2G --ioengine=libaio --iodepth=32 --bs=4K --direct=1 --filename=bench.file
read: IOPS=36.4k, BW=142MiB/s (149MB/s)(2048MiB/14398msec)
READ: bw=142MiB/s (149MB/s), 142MiB/s-142MiB/s (149MB/s-149MB/s), io=2048MiB (2147MB), run=14398-14398msec
To switch kernels on Ubuntu, I did the following:
Get a listing of all the installed kernels:
ubuntu@ubuntu:~$ sudo grub-mkconfig | grep -iE "menuentry 'Ubuntu, with Linux" | awk '{print i++ " : "$1, $2, $3, $4, $5, $6, $7}'
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-47-generic
Found initrd image: /boot/initrd.img-6.8.0-47-generic
Found linux image: /boot/vmlinuz-6.8.0-39-generic-64k
Found initrd image: /boot/initrd.img-6.8.0-39-generic-64k
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
0 : menuentry 'Ubuntu, with Linux 6.8.0-47-generic' --class ubuntu
1 : menuentry 'Ubuntu, with Linux 6.8.0-47-generic (recovery mode)'
2 : menuentry 'Ubuntu, with Linux 6.8.0-39-generic-64k' --class ubuntu
3 : menuentry 'Ubuntu, with Linux 6.8.0-39-generic-64k (recovery mode)'
Edit the Grub configuration.
$ sudoedit /etc/default/grub
# Set `GRUB_DEFAULT` to `0` to pick the first option / default.
#GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-39-generic-64k"
GRUB_DEFAULT=0
# Comment out the `GRUB_TIMEOUT_STYLE=hidden` line so it looks like:
#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
# After saving the file, run
$ sudo update-grub
$ sudo reboot
Technically I could hit Esc
(I think? Maybe Shift
?) during boot, but the timing for that is pretty narrow, so it's nicer to just have the menu appear during boot.
After reboot:
ubuntu@ubuntu:~$ uname -a
Linux ubuntu 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 22:03:50 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Now that I have the kernel back to 4k page size, I am running Geekbench 6 test. I noticed someone else ran one on the same motherboard/CPU in May: https://browser.geekbench.com/v6/cpu/6131970
14435 multicore vs my 15160. Single core spot on at 1309.
Geekbench 6 is horrible for this many cores—it didn't seem to even get halfway up to full multicore performance... Geekbench 5 at least pegs all the cores and hits 600W for some of the tests.
Geekbench Version | Single core | Multi Core | Peak Power Consumption |
---|---|---|---|
6.0.3 Arm preview | 1309 | 15160 | 279W |
5.4.0 Arm preview | 958 | 80639 | 586W |
Geekbench 6 is on the left:
Basic information
Linux/system information
Benchmark results
CPU
Power
stress-ng --matrix 0
): 500 Wtop500
HPL benchmark: 692 WDisk
Samsung NVMe SSD - 983 DCT M.2 960GB
Samsung NVMe SSD - MZQL21T9HCJR-00A07
Specs: https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/mzql21t9hcjr-00a07/
Single disk
RAID 0 (mdadm)
Network
iperf3
results:iperf3 -c $SERVER_IP
: 21.4 Gbpsiperf3 -c $SERVER_IP --reverse
: 18.8 Gbpsiperf3 -c $SERVER_IP --bidir
: 8.08 Gbps up, 22.2 Gbps downTested on one of the two built-in Broadcom
BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller
interfaces, to my HL15 Arm NAS (see: https://github.com/geerlingguy/arm-nas/issues/16), routed through a Mikrotik 25G Cloud Router.GPU
Did not test - this server doesn't have a GPU, just the ASPEED integrated BMC VGA graphics, which are not suitable for much GPU-accelerated gaming or LLMs, lol. Just render it on CPU!
Memory
tinymembench
results:Click to expand memory benchmark result
``` tinymembench v0.4.10 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 14199.7 MB/s (0.3%) C copy backwards (32 byte blocks) : 13871.7 MB/s C copy backwards (64 byte blocks) : 13879.6 MB/s (0.2%) C copy : 13890.6 MB/s (0.2%) C copy prefetched (32 bytes step) : 14581.4 MB/s C copy prefetched (64 bytes step) : 14613.8 MB/s C 2-pass copy : 10819.4 MB/s C 2-pass copy prefetched (32 bytes step) : 11313.6 MB/s C 2-pass copy prefetched (64 bytes step) : 11417.4 MB/s C fill : 31260.2 MB/s C fill (shuffle within 16 byte blocks) : 31257.1 MB/s C fill (shuffle within 32 byte blocks) : 31263.1 MB/s C fill (shuffle within 64 byte blocks) : 31260.9 MB/s NEON 64x2 COPY : 14464.3 MB/s (0.9%) NEON 64x2x4 COPY : 13694.9 MB/s NEON 64x1x4_x2 COPY : 12444.6 MB/s NEON 64x2 COPY prefetch x2 : 14886.9 MB/s NEON 64x2x4 COPY prefetch x1 : 14954.4 MB/s NEON 64x2 COPY prefetch x1 : 14892.3 MB/s NEON 64x2x4 COPY prefetch x1 : 14955.5 MB/s --- standard memcpy : 14141.9 MB/s standard memset : 31268.0 MB/s --- NEON LDP/STP copy : 13775.1 MB/s (0.7%) NEON LDP/STP copy pldl2strm (32 bytes step) : 14267.3 MB/s NEON LDP/STP copy pldl2strm (64 bytes step) : 14340.9 MB/s NEON LDP/STP copy pldl1keep (32 bytes step) : 14670.0 MB/s NEON LDP/STP copy pldl1keep (64 bytes step) : 14644.7 MB/s NEON LD1/ST1 copy : 13756.1 MB/s NEON STP fill : 31262.2 MB/s NEON STNP fill : 31265.7 MB/s ARM LDP/STP copy : 14454.0 MB/s (0.6%) ARM STP fill : 31265.6 MB/s ARM STNP fill : 31266.0 MB/s ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read, [MADV_NOHUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 0.0 ns / 0.0 ns 131072 : 1.1 ns / 1.6 ns 262144 : 1.7 ns / 2.0 ns 524288 : 1.9 ns / 2.2 ns 1048576 : 2.1 ns / 2.2 ns 2097152 : 3.0 ns / 3.3 ns 4194304 : 22.6 ns / 33.9 ns 8388608 : 33.7 ns / 44.3 ns 16777216 : 39.3 ns / 48.0 ns 33554432 : 42.1 ns / 49.4 ns 67108864 : 49.0 ns / 60.2 ns block size : single random read / dual random read, [MADV_HUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 0.0 ns / 0.0 ns 131072 : 1.1 ns / 1.6 ns 262144 : 1.7 ns / 2.0 ns 524288 : 1.9 ns / 2.2 ns 1048576 : 2.1 ns / 2.2 ns 2097152 : 3.0 ns / 3.3 ns 4194304 : 22.6 ns / 33.9 ns 8388608 : 33.7 ns / 44.3 ns 16777216 : 39.3 ns / 47.9 ns 33554432 : 42.1 ns / 49.4 ns 67108864 : 49.9 ns / 61.9 ns ```sbc-bench
resultsRun sbc-bench and paste a link to the results here: https://0x0.st/X0gc.bin
See: https://github.com/ThomasKaiser/sbc-bench/issues/105
Phoronix Test Suite
Results from pi-general-benchmark.sh:
Additional benchmarks
QEMU Coremark
The Ampere team have suggested running this, as it will emulate running tons of virtual instances with coremark inside, a good proxy of the type of performance you can get with VMs/containers on this system: https://github.com/AmpereComputing/qemu-coremark
llama.cpp (Ampere-optimized)
See: https://github.com/AmpereComputingAI/llama.cpp (I also have an email from Ampere with some testing notes).
Ollama (generic LLMs)
See: https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file#findings
yolo-v5
See: https://github.com/AmpereComputingAI/yolov5-demo (maybe test it on a 4K60 video, see how it fares).