AmpereOne A192-32X (Supermicro)

geerlingguy commented 2 weeks ago

Basic information

Board URL (official): https://www.supermicro.com/en/products/system/megadc/2u/ars-211me-fnr
Board purchased from: (Provided by Ampere/Supermicro)
Board purchase date: Oct 14, 2024
Board specs (as tested): A192-32X, 512GB DDR5 (5200 MT/s)
Board price (as tested): (If you have to ask...)

Linux/system information

# output of `screenfetch`
ubuntu@ubuntu:~$ screenfetch 
                          ./+o+-       ubuntu@ubuntu
                  yyyyy- -yyyyyy+      OS: Ubuntu 24.04 noble
               ://+//////-yyyyyyo      Kernel: aarch64 Linux 6.8.0-39-generic-64k
           .++ .:/++++++/-.+sss/`      Uptime: 23m
         .:++o:  /++++++++/:--:/-      Packages: 810
        o:+o+:++.`..```.-/oo+++++/     Shell: bash 5.2.21
       .:+o:+o/.          `+sssoo+/    Disk: 19G / 101G (20%)
  .++/+:+oo+o:`             /sssooo.   CPU: Ampere Ampere-1a @ 192x 3.2GHz
 /+++//+:`oo+o               /::--:.   GPU: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
 \+/+o+++`o++o               ++////.   RAM: 31390MiB / 522867MiB
  .++.o+++oo+:`             /dddhhh.  
       .+.o+oo:.          `oddhhhh+   
        \+.++o+o``-````.:ohdhhhhh+    
         `:o+++ `ohhhhhhhhyo++os:     
           .o:`.syhhhhhhh/.oo++o`     
               /osyyyyyyo++ooo+++/    
                   ````` +oo+++o\:    
                          `oo++.     

# output of `uname -a`
Linux ubuntu 6.8.0-39-generic-64k #39-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul  6 11:08:16 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Benchmark results

CPU

Geekbench 6: 1309 single / 15160 multi (result lower performance though, see Geekbench w/ page sizes above 4K and STH article)
Geekbench 5: 958 single / 80639 multi (result)
3,027 Gflops at 692W / 4.37 Gflops/W (geerlingguy/top500-benchmark HPL result)

Power

Idle power draw (at wall): 199 W
Maximum simulated power draw (stress-ng --matrix 0): 500 W
During Geekbench multicore benchmark: 300-600 W (depending on Geekbench version)
During top500 HPL benchmark: 692 W

Disk

Samsung NVMe SSD - 983 DCT M.2 960GB

Benchmark	Result
iozone 4K random read	50.35 MB/s
iozone 4K random write	216.04 MB/s
iozone 1M random read	2067.82 MB/s
iozone 1M random write	1295.13 MB/s
iozone 1M sequential read	2098.31 MB/s
iozone 1M sequential write	1291.07 MB/s

wget https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh
chmod +x disk-benchmark.sh
sudo MOUNT_PATH=/ TEST_SIZE=1g ./disk-benchmark.sh

Samsung NVMe SSD - MZQL21T9HCJR-00A07

Specs: https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/mzql21t9hcjr-00a07/

Single disk

Benchmark	Result
iozone 4K random read	60.19 MB/s
iozone 4K random write	284.72 MB/s
iozone 1M random read	3777.29 MB/s
iozone 1M random write	2686.80 MB/s
iozone 1M sequential read	3773.44 MB/s
iozone 1M sequential write	2680.90 MB/s

RAID 0 (mdadm)

Benchmark	Result
iozone 4K random read	58.05 MB/s
iozone 4K random write	250.06 MB/s
iozone 1M random read	5444.03 MB/s
iozone 1M random write	4411.07 MB/s
iozone 1M sequential read	7120.75 MB/s
iozone 1M sequential write	4458.30 MB/s

Network

iperf3 results:

iperf3 -c $SERVER_IP: 21.4 Gbps
iperf3 -c $SERVER_IP --reverse: 18.8 Gbps
iperf3 -c $SERVER_IP --bidir: 8.08 Gbps up, 22.2 Gbps down

Tested on one of the two built-in Broadcom BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller interfaces, to my HL15 Arm NAS (see: https://github.com/geerlingguy/arm-nas/issues/16), routed through a Mikrotik 25G Cloud Router.

GPU

Did not test - this server doesn't have a GPU, just the ASPEED integrated BMC VGA graphics, which are not suitable for much GPU-accelerated gaming or LLMs, lol. Just render it on CPU!

Memory

tinymembench results:

Click to expand memory benchmark result

``` tinymembench v0.4.10 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 14199.7 MB/s (0.3%) C copy backwards (32 byte blocks) : 13871.7 MB/s C copy backwards (64 byte blocks) : 13879.6 MB/s (0.2%) C copy : 13890.6 MB/s (0.2%) C copy prefetched (32 bytes step) : 14581.4 MB/s C copy prefetched (64 bytes step) : 14613.8 MB/s C 2-pass copy : 10819.4 MB/s C 2-pass copy prefetched (32 bytes step) : 11313.6 MB/s C 2-pass copy prefetched (64 bytes step) : 11417.4 MB/s C fill : 31260.2 MB/s C fill (shuffle within 16 byte blocks) : 31257.1 MB/s C fill (shuffle within 32 byte blocks) : 31263.1 MB/s C fill (shuffle within 64 byte blocks) : 31260.9 MB/s NEON 64x2 COPY : 14464.3 MB/s (0.9%) NEON 64x2x4 COPY : 13694.9 MB/s NEON 64x1x4_x2 COPY : 12444.6 MB/s NEON 64x2 COPY prefetch x2 : 14886.9 MB/s NEON 64x2x4 COPY prefetch x1 : 14954.4 MB/s NEON 64x2 COPY prefetch x1 : 14892.3 MB/s NEON 64x2x4 COPY prefetch x1 : 14955.5 MB/s --- standard memcpy : 14141.9 MB/s standard memset : 31268.0 MB/s --- NEON LDP/STP copy : 13775.1 MB/s (0.7%) NEON LDP/STP copy pldl2strm (32 bytes step) : 14267.3 MB/s NEON LDP/STP copy pldl2strm (64 bytes step) : 14340.9 MB/s NEON LDP/STP copy pldl1keep (32 bytes step) : 14670.0 MB/s NEON LDP/STP copy pldl1keep (64 bytes step) : 14644.7 MB/s NEON LD1/ST1 copy : 13756.1 MB/s NEON STP fill : 31262.2 MB/s NEON STNP fill : 31265.7 MB/s ARM LDP/STP copy : 14454.0 MB/s (0.6%) ARM STP fill : 31265.6 MB/s ARM STNP fill : 31266.0 MB/s ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read, [MADV_NOHUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 0.0 ns / 0.0 ns 131072 : 1.1 ns / 1.6 ns 262144 : 1.7 ns / 2.0 ns 524288 : 1.9 ns / 2.2 ns 1048576 : 2.1 ns / 2.2 ns 2097152 : 3.0 ns / 3.3 ns 4194304 : 22.6 ns / 33.9 ns 8388608 : 33.7 ns / 44.3 ns 16777216 : 39.3 ns / 48.0 ns 33554432 : 42.1 ns / 49.4 ns 67108864 : 49.0 ns / 60.2 ns block size : single random read / dual random read, [MADV_HUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 0.0 ns / 0.0 ns 131072 : 1.1 ns / 1.6 ns 262144 : 1.7 ns / 2.0 ns 524288 : 1.9 ns / 2.2 ns 1048576 : 2.1 ns / 2.2 ns 2097152 : 3.0 ns / 3.3 ns 4194304 : 22.6 ns / 33.9 ns 8388608 : 33.7 ns / 44.3 ns 16777216 : 39.3 ns / 47.9 ns 33554432 : 42.1 ns / 49.4 ns 67108864 : 49.9 ns / 61.9 ns ```

`sbc-bench` results

Run sbc-bench and paste a link to the results here: https://0x0.st/X0gc.bin

See: https://github.com/ThomasKaiser/sbc-bench/issues/105

Phoronix Test Suite

Results from pi-general-benchmark.sh:

pts/encode-mp3: 11.248 sec
pts/x264 4K: 69.49 fps
pts/x264 1080p: 160.75 fps
pts/phpbench: 567108
pts/build-linux-kernel (defconfig): 50.101 sec

Additional benchmarks

QEMU Coremark

The Ampere team have suggested running this, as it will emulate running tons of virtual instances with coremark inside, a good proxy of the type of performance you can get with VMs/containers on this system: https://github.com/AmpereComputing/qemu-coremark

ubuntu@ubuntu:~/qemu-coremark$ ./run_pts.sh 2
47 instances of pts/coremark running in parallel in arm64 VMs!
Round 1 - Total CoreMark Score is: 4697344
Round 2 - Total CoreMark Score is: 4684524

llama.cpp (Ampere-optimized)

See: https://github.com/AmpereComputingAI/llama.cpp (I also have an email from Ampere with some testing notes).

Ollama (generic LLMs)

See: https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file#findings

System	CPU/GPU	Model	Eval Rate
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.2:3b	23.52 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:8b	17.47 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:70b	3.86 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:405b	0.90 Tokens/s

yolo-v5

See: https://github.com/AmpereComputingAI/yolov5-demo (maybe test it on a 4K60 video, see how it fares).

geerlingguy commented 2 weeks ago

Getting full 25 Gbps Ethernet on the 2nd interface:

ubuntu@ubuntu:~$ ethtool eno2np1
Settings for eno2np1:
    Supported ports: [ FIBRE ]
    Supported link modes:   25000baseCR/Full
                            1000baseX/Full
                            10000baseCR/Full
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: RS  BASER
    Advertised link modes:  25000baseCR/Full
                            1000baseX/Full
                            10000baseCR/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 25000Mb/s
    Lanes: 1
    Duplex: Full
    Auto-negotiation: on
    Port: Direct Attach Copper
    PHYAD: 1
    Transceiver: internal
netlink error: Operation not permitted
        Current message level: 0x00002081 (8321)
                               drv tx_err hw
    Link detected: yes

If I try running Geekbench 6 I get a core dump, lol:

ubuntu@ubuntu:~/Geekbench-6.3.0-LinuxARMPreview$ ./geekbench6
<jemalloc>: Unsupported system page size
<jemalloc>: Unsupported system page size
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

I opened up a support issue for that: Can't run Geekbench 6 Arm Preview on AmpereOne 192-core system

geerlingguy commented 2 weeks ago

And yes, I know this system is not really an SBC. I still want to test it against Arm SBCs, though ;)

geerlingguy commented 1 week ago

To get btop to show the CPU SoC temps instead of apm_xgene/IO Power, I went into options o, tabbed to the CPU tab, and under 'Cpu sensor' changed it to apm_xgene/SoC Temperature.

ThomasKaiser commented 1 week ago

Jeff, if time permits could you please check this:

grep CONFIG_ARM64_MTE /boot/config-6.8.0*

Background: the CPU cores should be capable of MTE but your machine doesn't expose the feature via /proc/cpuinfo.

hrw commented 1 week ago

No GPU in it but can you check it with some AMD/NVIDIA graphic cards?

geerlingguy commented 1 week ago

@hrw - I'd love to find a way to get a test sample of one of AMD or Nvidia's enterprise server cards—right now the best fit I have is an older Quadro RTX card, but it won't fit in this chassis.

@ThomasKaiser I'll try to run that next time I have the server booted (remind me if I forget next week); I shut it down over the weekend and a boot cycle takes 5-10 minutes, so I'm too lazy to sit and wait today for one command!

hrw commented 1 week ago

@geerlingguy "add pcie x16 riser cable to your shopping list" was my first idea but then I realized that server case would lack power cables for gpu as well.

geerlingguy commented 1 week ago

@hrw - The server actually includes 2x8 pin PCIe power connections, it's designed for up to 1 fanless GPU (needs high CFM to keep cool).

geerlingguy commented 1 week ago

It looks like one stick of RAM was spewing errors, see https://github.com/geerlingguy/top500-benchmark/issues/43#issuecomment-2441998089

I've re-seated that RAM module (DIMMF1), and am going to re-run all benchmarks so far. It is not erroring out now.

geerlingguy commented 1 week ago

@ThomasKaiser:

ubuntu@ubuntu:$ grep CONFIG_ARM64_MTE /boot/config-6.8.0*
/boot/config-6.8.0-39-generic-64k:CONFIG_ARM64_MTE=y
/boot/config-6.8.0-47-generic:CONFIG_ARM64_MTE=y

geerlingguy commented 1 week ago

Attempting qemu-coremark, during setup I'm getting an error: meson setup fails with 'Dependency "glib-2.0" not found'

geerlingguy commented 1 week ago

Had to install libglib2.0-dev manually, then add myself to the kvm group, but now the benchmark runs.

geerlingguy commented 6 days ago

I noticed when I run sudo shutdown now, I get logged out of ubuntu and SSH goes away, but then the server won't actually power off (and go into BMC-only mode) for many minutes.

Watching the SOL Console today, I saw tons of errors like:

[ 5261.993963] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ee0} len:0
[ 5270.120534] {1788}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[ 5270.129045] {1788}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 5270.137729] {1788}[Hardware Error]: event severity: corrected
[ 5270.143461] {1788}[Hardware Error]:  Error 0, type: corrected
[ 5270.149193] {1788}[Hardware Error]:   section_type: memory error
[ 5270.155186] {1788}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[ 5270.164478] {1788}[Hardware Error]:   node:0 card:5 module:16 device:7 
[ 5270.171078] {1788}[Hardware Error]:   error_type: 13, scrub corrected error
[ 5270.178026] EDAC MC0: 1 CE scrub corrected error on unknown memory (node:0 card:5 module:16 device:7 page:0x0 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:5 module:16 device:7 status(0x0000000000000400): Storage error in DRAM memory)
[ 5271.187341] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ee4} len:0
[ 5280.388425] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x3ef0} len:0

So it looks like that DIMM is throwing a bunch of errors, maybe causing the Ethernet driver to throw other errors?

[ 5372.462135] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 5372.468653] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[ 5381.671651] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[ 5381.678169] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
...
[ 5417.936638] INFO: task kworker/72:1:1300 blocked for more than 122 seconds.
[ 5417.943594]       Tainted: G        W          6.8.0-39-generic-64k #39-Ubuntu
[ 5417.950804] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
[ 5603.138033] EDAC MC0: 1 CE single-symbol chipkill ECC on P0_Node0_Channel5_Dimm0 DIMMF1 (node:0 card:5 module:16 rank:0 bank_group:3 bank_address:3 device:7 row:1479 column:1216 DIMM location: P0_Node0_Channel5_Dimm0 DIMMF1 page:0x2e3b7 offset:0x3800 grain:1 syndrome:0x0 - APEI location: node:0 card:5 module:16 rank:0 bank_group:3 bank_address:3 device:7 row:1479 column:1216 DIMM location: P0_Node0_Channel5_Dimm0 DIMMF1 status(0x0000000000000400): Storage error in DRAM memory)
... [finally a long time later] ...
[ 5900.617885] reboot: Power down

It's still always DIMMF1 :)

bexcran commented 6 days ago

I saw the shutdown of an AmpereOne machine I was testing take a really long time too due to the Broadcom Ethernet driver. But I didn’t see any of the DRAM or APEI issues, so I’m not sure they’re related.

geerlingguy commented 6 days ago

I saw the shutdown of an AmpereOne machine I was testing take a really long time too due to the Broadcom Ethernet driver.

Hmm, maybe that's it then — those messages kept popping in amidst all the DIMM messages. Might be nice to figure out how to fix the bnxt_en driver!

geerlingguy commented 6 days ago

Testing a RAID 0 array of all the NVMe drives following my guide:

ubuntu@ubuntu:~$ sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=6 /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1 /dev/nvme3n1p1 /dev/nvme5n1p1 /dev/nvme6n1p1

ubuntu@ubuntu:~$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Wed Oct 30 16:37:22 2024
        Raid Level : raid0
        Array Size : 11251445760 (10.48 TiB 11.52 TB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

       Update Time : Wed Oct 30 16:37:22 2024
             State : clean 
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : original
        Chunk Size : 512K

Consistency Policy : none

              Name : ubuntu:0  (local to host ubuntu)
              UUID : 6dd22af6:0fd54fa0:9463f73f:636afb4e
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     259       11        0      active sync   /dev/nvme0n1p1
       1     259       13        1      active sync   /dev/nvme1n1p1
       2     259       12        2      active sync   /dev/nvme2n1p1
       3     259       14        3      active sync   /dev/nvme3n1p1
       4     259       15        4      active sync   /dev/nvme5n1p1
       5     259       16        5      active sync   /dev/nvme6n1p1

ubuntu@ubuntu:~$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0
ubuntu@ubuntu:~$ sudo mkdir /mnt/raid0
ubuntu@ubuntu:~$ sudo mount /dev/md0 /mnt/raid0

Running my disk benchmark on the array...

Benchmark	Result
iozone 4K random read	58.05 MB/s
iozone 4K random write	250.06 MB/s
iozone 1M random read	5444.03 MB/s
iozone 1M random write	4411.07 MB/s
iozone 1M sequential read	7120.75 MB/s
iozone 1M sequential write	4458.30 MB/s

geerlingguy commented 5 days ago

Ampere sent over a replacement DIMM, and it seems to have corrected all the memory issues.

However, shutdown is still excruciating — timing this shutdown cycle, it took 15+ minutes, and I just see tons of Ethernet NIC errors (see below for a snippet), maybe a bug in the bnxt_en driver on arm64?

[  224.516490] infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
[  224.523180] infiniband bnxt_re0: Couldn't start port
[  224.528173] bnxt_en 0003:02:00.0 bnxt_re0: Failed to destroy HW QP
[  224.534384] ------------[ cut here ]------------
[  224.538988] WARNING: CPU: 97 PID: 2721 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x13c/0x1d8 [ib_core]
[  224.548759] Modules linked in: tls xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables qrtr overlay nls_iso8859_1 bnxt_re(+) ampere_cspmu cfg80211 dax_hmem acpi_ipmi ib_uverbs cxl_acpi ast cxl_core ipmi_ssif arm_cspmu_module arm_spe_pmu i2c_algo_bit ib_core onboard_usb_hub acpi_tad arm_cmn ipmi_msghandler xgene_hwmon cppc_cpufreq sch_fq_codel binfmt_misc dm_multipath nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 crct10dif_ce polyval_ce polyval_generic ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 sha3_ce sha2_ce nvme sha256_arm64 sha1_ce nvme_core bnxt_en xhci_pci xhci_pci_renesas nvme_auth aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: ipmi_devintf]
[  224.637726] CPU: 97 PID: 2721 Comm: (udev-worker) Not tainted 6.8.0-39-generic-64k #39-Ubuntu
[  224.646237] Hardware name: Supermicro Super Server/R13SPD, BIOS T20241001152934 10/01/2024
[  224.654487] pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
[  224.661437] pc : ib_free_cq+0x13c/0x1d8 [ib_core]
[  224.666152] lr : ib_mad_port_open+0x220/0x450 [ib_core]
[  224.671388] sp : ffff80010920f520
[  224.674690] x29: ffff80010920f520 x28: 0000000000000000 x27: ffffb4c746059120
[  224.681813] x26: 0000000000000000 x25: ffff0002527e8870 x24: ffff0002527e88f8
[  224.688936] x23: ffffb4c7465f3e90 x22: 00000000ffffff92 x21: ffffb4c7465fc550
[  224.696060] x20: ffff000246000000 x19: ffff00015794bc00 x18: ffff8000e8d400f0
[  224.703182] x17: 0000000000000000 x16: 0000000000000000 x15: 6c6c6174735f7766
[  224.710305] x14: 0000000000000000 x13: 505120574820796f x12: 7274736564206f74
[  224.717429] x11: 2064656c69614620 x10: 0000000000000000 x9 : ffffb4c7465c58b0
[  224.724552] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[  224.731675] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  224.738798] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000002
[  224.745921] Call trace:
[  224.748355]  ib_free_cq+0x13c/0x1d8 [ib_core]
[  224.752723]  ib_mad_port_open+0x220/0x450 [ib_core]
[  224.757609]  ib_mad_init_device+0x78/0x228 [ib_core]
[  224.762582]  add_client_context+0xfc/0x208 [ib_core]
[  224.767556]  enable_device_and_get+0xe0/0x1e0 [ib_core]
[  224.772790]  ib_register_device.part.0+0x130/0x218 [ib_core]
[  224.778459]  ib_register_device+0x38/0x68 [ib_core]
[  224.783345]  bnxt_re_ib_init+0x120/0x238 [bnxt_re]
[  224.788135]  bnxt_re_probe+0x14c/0x268 [bnxt_re]
[  224.792746]  auxiliary_bus_probe+0x50/0x108
[  224.796920]  really_probe+0x1c0/0x420
[  224.800575]  __driver_probe_device+0x94/0x1d8
[  224.804920]  driver_probe_device+0x48/0x188
[  224.809091]  __driver_attach+0x14c/0x2c8
[  224.813002]  bus_for_each_dev+0x88/0x110
[  224.816913]  driver_attach+0x30/0x60
[  224.820476]  bus_add_driver+0x17c/0x2d0
[  224.824300]  driver_register+0x68/0x178
[  224.828125]  __auxiliary_driver_register+0x78/0x148
[  224.832990]  bnxt_re_mod_init+0x54/0xfff8 [bnxt_re]
[  224.837861]  do_one_initcall+0x64/0x3b8
[  224.841687]  do_init_module+0xa0/0x280
[  224.845425]  load_module+0x7b8/0x8f0
[  224.848988]  init_module_from_file+0x98/0x118
[  224.853332]  idempotent_init_module+0x1a4/0x2c8
[  224.857850]  __arm64_sys_finit_module+0x70/0xf8
[  224.862368]  invoke_syscall.constprop.0+0x84/0x100
[  224.867147]  do_el0_svc+0xe4/0x100
[  224.870536]  el0_svc+0x48/0x1c8
[  224.873673]  el0t_64_sync_handler+0x148/0x158
[  224.878019]  el0t_64_sync+0x1b0/0x1b8
[  224.881670] ---[ end trace 0000000000000000 ]---
[  224.886282] bnxt_en 0003:02:00.0 bnxt_re0: Free MW failed: 0xffffff92
[  224.892720] infiniband bnxt_re0: Couldn't open port 1
[  257.266860] INFO: task (udev-worker):2732 blocked for more than 122 seconds.
[  257.273911]       Tainted: G        W          6.8.0-39-generic-64k #39-Ubuntu
[  257.281123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  284.760123] systemd-shutdown[1]: Waiting for process: 2721 ((udev-worker)), 2732 ((udev-worker))
[  326.899586] bnxt_en 0003:02:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (101989 > 100000) msec active 1 
[  326.911840] bnxt_en 0003:02:00.1 bnxt_re1: Failed to modify HW QP
[  326.917924] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[  326.924614] infiniband bnxt_re1: Couldn't start port
[  326.929635] bnxt_en 0003:02:00.1 bnxt_re1: Failed to destroy HW QP
[  326.935847] bnxt_en 0003:02:00.1 bnxt_re1: Free MW failed: 0xffffff92
[  326.942289] infiniband bnxt_re1: Couldn't open port 1
[  327.166856] bnxt_en 0003:02:00.1 bnxt_re1: Failed to deinitialize RCFW: 0xffffff92
[  327.184299] bnxt_en 0003:02:00.0 bnxt_re0: Failed to remove GID: 0xffffff92
[  327.192669] bnxt_en 0003:02:00.0 bnxt_re0: Failed to deinitialize RCFW: 0xffffff92
[  338.125977] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x44 0x69b} len:0
[  338.134153] bnxt_en 0003:02:00.1 eno2np1: hwrm vnic set tpa failure rc for vnic 2: fffffff0
[  340.376637] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5eb} len:0
[  347.935864] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6a4} len:0
[  350.701222] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5f0} len:0
[  357.708225] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6ae} len:0
[  360.760027] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x23 0x5f1} len:0
[  367.482017] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b4} len:0
[  370.821975] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5f5} len:0
[  377.268121] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b7} len:0
[  381.507812] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5fa} len:0
[  387.048016] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6b9} len:0
[  391.749581] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0xb4 0x5ff} len:0
[  396.814118] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6c6} len:0
[  403.018680] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x23 0x606} len:0
[  406.578705] bnxt_en 0003:02:00.1 eno2np1: Error (timeout: 5000015) msg {0x41 0x6cf} len:0
...
[  621.657292] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[  621.663810] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[  625.054417] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x37 0x690} len:0
[  625.911529] INFO: task kworker/34:0:223 blocked for more than 245 seconds.
[  625.918395]       Tainted: G        W          6.8.0-39-generic-64k #39-Ubuntu
[  625.925604] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  631.417059] bnxt_en 0003:02:00.1 eno2np1: Resp cmpl intr err msg: 0x51
[  631.423577] bnxt_en 0003:02:00.1 eno2np1: hwrm_ring_free type 2 failed. rc:fffffff0 err:0
[  635.129886] bnxt_en 0003:02:00.0 eno1np0: Error (timeout: 5000015) msg {0x37 0x693} len:0
...

bexcran commented 5 days ago

Could you try blacklisting the bnxt_re module? To do so, edit /etc/modprobe.d/blacklist.conf and add:

blacklist bnxt_re

From https://utcc.utoronto.ca/~cks/space/blog/linux/BroadcomNetworkDriverAndRDMA?showcomments :

The driver stalls during boot and spits out kernel messages like:
bnxt_en 0000:ab:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xf]=0x3 waited (102721 > 100000) msec active 1
bnxt_en 0000:ab:00.0 bnxt_re0: Failed to modify HW QP
infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
infiniband bnxt_re0: Couldn't start port
bnxt_en 0000:ab:00.0 bnxt_re0: Failed to destroy HW QP
[... more fun ensues ...]
This causes systemd-udev-settle.service to fail:

udevadm[1212]: Timed out for waiting the udev queue being empty. systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE

This then causes Ubuntu 24.04's ZFS services to fail to completely start, which is a bad thing on hardware that we want to >use for our ZFS fileservers.

We aren't the only people with this problem, so I was able to find various threads on the Internet, for example. These gave me the solution, which >is to blacklist the bnxt_re kernel module, but at the time left me with the mystery of how and why the bnxt_re module >was even being loaded in the first place.

geerlingguy commented 5 days ago

@bexcran - Will try that, after waiting 15 minutes I just pushed an immediate shutdown so I could finally power cycle.

(On the plus side, power on is much faster now, with a DIMM not spewing out errors all the time.)

bexcran commented 5 days ago

@geerlingguy That's what I'd do too! If you want to be a bit nicer to your system/filesystem and you have "Magic SysRq Keys" enabled you can do:

ALT+PrintScreen+s,u,b

That is, press and hold ALT and SysRq (will probably be labeled PrintScr on your keyboard instead of SysRq) while pressing 's', then 'u' then 'b' without letting go of ALT and SysRq.

That'll sync data to disk, attempt to unmount filesystems and the reboot.

https://docs.kernel.org/admin-guide/sysrq.html

geerlingguy commented 5 days ago

I added /etc/modprobe.d/blacklist-bnxt.conf with blacklist bnxt_re inside, and rebooted.

Now, it reaches poweroff.target within 3 seconds, and Power down state after about 12. SOOOOO much nicer lol.

I guess if I ever need Infiniband over Ethernet, I can figure out that bnxt_re module, otherwise, not sure why it would load by default!

geerlingguy commented 5 days ago

@bexcran - Is there any simple way of switching the kernel I'm booting on here? I would like to try the 4K kernel just to see if Geekbench will complete a run, but the default kernel that it's running right now (for performance reasons) is 64K.

bexcran commented 5 days ago

@geerlingguy Sorry, I don't know.

geerlingguy commented 5 days ago

I may do a reinstall of the OS on a separate drive just to do that test then.

geerlingguy commented 4 days ago

Also, now that I have my Ampere Altra 32-core NAS server upgraded to 25 Gbps Ethernet: https://github.com/geerlingguy/arm-nas/issues/16

I can finally run the iperf3 test between these two machines!

ubuntu@ubuntu:~$ iperf3 -c 10.0.2.51
Connecting to host 10.0.2.51, port 5201
[  5] local 10.0.2.21 port 41304 connected to 10.0.2.51 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.73 GBytes  23.4 Gbits/sec    9   1.37 MBytes       
[  5]   1.00-2.00   sec  2.69 GBytes  23.1 Gbits/sec  423    998 KBytes       
[  5]   2.00-3.00   sec  2.22 GBytes  19.1 Gbits/sec   80    737 KBytes       
[  5]   3.00-4.00   sec  1.82 GBytes  15.6 Gbits/sec   93    928 KBytes       
[  5]   4.00-5.00   sec  2.56 GBytes  22.0 Gbits/sec  211    997 KBytes       
[  5]   5.00-6.00   sec  2.57 GBytes  22.1 Gbits/sec  250    765 KBytes       
[  5]   6.00-7.00   sec  2.56 GBytes  22.0 Gbits/sec  128    952 KBytes       
[  5]   7.00-8.00   sec  2.57 GBytes  22.1 Gbits/sec  198    846 KBytes       
[  5]   8.00-9.00   sec  2.58 GBytes  22.2 Gbits/sec  113   1.07 MBytes       
[  5]   9.00-10.00  sec  2.59 GBytes  22.2 Gbits/sec  141    718 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  24.9 GBytes  21.4 Gbits/sec  1646             sender
[  5]   0.00-10.04  sec  24.9 GBytes  21.3 Gbits/sec                  receiver

I've noticed some variances—hard to tell if it's on the NAS side, the AmpereOne side, or my cloud router. None of them are showing 100% CPU utilization, and watching on atop, I don't see any interrupt issues or any other bottleneck.

geerlingguy commented 4 days ago

Testing a copy over SMB from one of the NVMe on this system to the NVMe on the HL15:

$ sudo apt install cifs-utils
$ sudo mkdir /mnt/mercury
$ sudo mount -t cifs -o user=jgeerling,uid=$(id -u),gid=$(id -g) //nas01.mmoffice.net/mercury /mnt/mercury

# Inside /mnt/nvme/test, create a large file
$ fallocate -l 100G largefile

# Benchmark file copy over SMB *to* NAS01
ubuntu@ubuntu:/mnt/nvme/test$ rsync --info=progress2 -a largefile /mnt/mercury/test/largefile
107,374,182,400 100%    1.14GB/s    0:01:27 (xfr#1, to-chk=0/1)

# Benchmark file copy over SMB *from* NAS01
ubuntu@ubuntu:/mnt/nvme/test$ rsync --info=progress2 -a /mnt/mercury/test/largefile largefile
107,374,182,400 100%    1.03GB/s    0:01:37 (xfr#1, to-chk=0/1)

Not quite as fast as I was hoping, but this is dealing with SMB + Ethernet + rsync overhead, and I saw it going between 8-15 Gbps on the NAS. Interesting that the copy back was noticeably slower (about 1 Gbps slower).

Testing with fio:

$ fio --name=job-w --rw=write --size=2G --ioengine=libaio --iodepth=4 --bs=128k --direct=1 --filename=bench.file
WRITE: bw=864MiB/s (906MB/s), 864MiB/s-864MiB/s (906MB/s-906MB/s), io=2048MiB (2147MB), run=2370-2370msec

$ fio --name=job-r --rw=read --size=2G --ioengine=libaio --iodepth=4 --bs=128K --direct=1 --filename=bench.file
READ: bw=1267MiB/s (1328MB/s), 1267MiB/s-1267MiB/s (1328MB/s-1328MB/s), io=2048MiB (2147MB), run=1617-1617msec

$ fio --name=job-randw --rw=randwrite --size=2G --ioengine=libaio --iodepth=32 --bs=4k --direct=1 --filename=bench.file
read: IOPS=15.0k, BW=59.4MiB/s (62.3MB/s)(2048MiB/34486msec)
WRITE: bw=59.4MiB/s (62.3MB/s), 59.4MiB/s-59.4MiB/s (62.3MB/s-62.3MB/s), io=2048MiB (2147MB), run=34486-34486msec

$ fio --name=job-randr --rw=randread --size=2G --ioengine=libaio --iodepth=32 --bs=4K --direct=1 --filename=bench.file
read: IOPS=36.4k, BW=142MiB/s (149MB/s)(2048MiB/14398msec)
READ: bw=142MiB/s (149MB/s), 142MiB/s-142MiB/s (149MB/s-149MB/s), io=2048MiB (2147MB), run=14398-14398msec

geerlingguy commented 4 days ago

To switch kernels on Ubuntu, I did the following:

Get a listing of all the installed kernels:

ubuntu@ubuntu:~$ sudo grub-mkconfig | grep -iE "menuentry 'Ubuntu, with Linux" | awk '{print i++ " : "$1, $2, $3, $4, $5, $6, $7}'
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-47-generic
Found initrd image: /boot/initrd.img-6.8.0-47-generic
Found linux image: /boot/vmlinuz-6.8.0-39-generic-64k
Found initrd image: /boot/initrd.img-6.8.0-39-generic-64k
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
0 : menuentry 'Ubuntu, with Linux 6.8.0-47-generic' --class ubuntu
1 : menuentry 'Ubuntu, with Linux 6.8.0-47-generic (recovery mode)'
2 : menuentry 'Ubuntu, with Linux 6.8.0-39-generic-64k' --class ubuntu
3 : menuentry 'Ubuntu, with Linux 6.8.0-39-generic-64k (recovery mode)'

Edit the Grub configuration.

$ sudoedit /etc/default/grub

# Set `GRUB_DEFAULT` to `0` to pick the first option / default.
#GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-39-generic-64k"
GRUB_DEFAULT=0

# Comment out the `GRUB_TIMEOUT_STYLE=hidden` line so it looks like:
#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0

# After saving the file, run
$ sudo update-grub
$ sudo reboot

Technically I could hit Esc (I think? Maybe Shift?) during boot, but the timing for that is pretty narrow, so it's nicer to just have the menu appear during boot.

After reboot:

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 22:03:50 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

geerlingguy commented 4 days ago

Now that I have the kernel back to 4k page size, I am running Geekbench 6 test. I noticed someone else ran one on the same motherboard/CPU in May: https://browser.geekbench.com/v6/cpu/6131970

14435 multicore vs my 15160. Single core spot on at 1309.

Geekbench 6 is horrible for this many cores—it didn't seem to even get halfway up to full multicore performance... Geekbench 5 at least pegs all the cores and hits 600W for some of the tests.

Geekbench Version	Single core	Multi Core	Peak Power Consumption
6.0.3 Arm preview	1309	15160	279W
5.4.0 Arm preview	958	80639	586W

Geekbench 6 is on the left:

geerlingguy / sbc-reviews