Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Aarch64 memory benchmark performance vs ARMv7 #28309

Closed Quuxplusone closed 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR28310
Status RESOLVED INVALID
Importance P normal
Reported by PeteVine (tulipawn@gmail.com)
Reported on 2016-06-26 09:41:57 -0700
Last modified on 2016-08-28 17:21:13 -0700
Version 3.8
Hardware Other Linux
CC anton@korobeynikov.info, llvm-bugs@lists.llvm.org, tulipawn@gmail.com
Fixed by commit(s)
Attachments binary_trees_benchmark.zip (2335 bytes, application/zip)
Blocks
Blocked by
See also

Created attachment 16635 Benchmark files, cargo ready

Since yesterday, I've been playing with Rust on an aarch64 Cortex-A53 Android TV box (2GB RAM, Amlogic S905) that I'd converted to 64-bit Linux.

All's fine and good so far, except for memory benchmark performance, especially using jemalloc, which is relatively worse compared to ARMv7 (and substantially worse in absolute terms).

Am I not enabling some erratum perhaps? The native aarch64 binary_trees benchmark (@23 tree depth) takes:

sysalloc 1m28s 5m10s 0m10s jemalloc 1m35s 5m10s 0m53s

whereas, the corresponding ARMv7 binaries (running on the same 64-bit system):

sysalloc 1m9s 3m59s 0m19s jemalloc 1m11s 3m58s 0m25s

@jmolloy I'm aware better performance using 32-bit pointers is probably expected but what about jemalloc performance drop?

To reproduce, run cargo build --release && time target/release/binary_trees 23 inside the binary_trees directory. Uncomment the first 2 lines in main.rs to produce a sysalloc version.

Quuxplusone commented 8 years ago

Attached binary_trees_benchmark.zip (2335 bytes, application/zip): Benchmark files, cargo ready

Quuxplusone commented 8 years ago

Just as in PR 28345, the blobbed kernel was lying about the actual frequency and instead ran at 1.5GHz all that time. Never mentioned here, but S905 was supposed to run @2GHz which actually led to this report.