bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.98k stars 628 forks source link

sysbench memory and threads workload is giving less benchmarking results with iwasm than native aarch64 gcc execution #3752

Open subhakr opened 3 months ago

subhakr commented 3 months ago

Subject of the issue

I have ran both modes like in native aarch64 with gcc and with WAMR runtime. In aarch64 native with gcc its giving better result like below.

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench memory --memory-block-size=1K --memory-total-size=3G --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1KiB
  total size: 3072MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 2599323 (866387.19 per second)

2538.40 MiB transferred (846.08 MiB/sec)

Throughput:
    events/s (eps):                      866387.1913
    time elapsed:                        3.0002s
    total number of events:              2599323

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.12
         95th percentile:                        0.00
         sum:                                 1308.86

Threads fairness:
    events (avg/stddev):           2599323.0000/0.00
    execution time (avg/stddev):   1.3089/0.00

but for aarch64 with WAMR i getting less benchmarking results as below

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# ./iwasm  sysbench.aot memory --memory-block-size=1K --memory-total-size=3G --time=3 run
Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1KiB
  total size: 3072MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 532424 (177466.05 per second)

519.95 MiB transferred (173.31 MiB/sec)

Throughput:
    events/s (eps):                      177466.0549
    time elapsed:                        3.0001s
    total number of events:              532424

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.16
         95th percentile:                        0.00
         sum:                                 1366.00

Threads fairness:
    events (avg/stddev):           532424.0000/0.00
    execution time (avg/stddev):   1.3660/0.00

Test case

sysbench.zip in above zip file i have sysbench.aot which is compatible to aarch64.

Your environment

Host OS: Ubuntu 22.04 LTS WAMR version: 2.1.1 CPU architecture: aarch64 RAM:3GB *Internal space:128GB

Steps to reproduce

for iwasm ./iwasm sysbench.aot memory --memory-block-size=1K --memory-total-size=3G --time=3 run

for native aarch64, we have to install sysbench then have to run below command. sysbench memory --memory-block-size=1K --memory-total-size=3G --time=3 run

Expected behavior

I ran sysbench.aot for cpu workload with iwasm i got better results than native sysbench aarch64 with gcc execution.

Actual behavior

But i am getting less benchmark results than sysbnech aarch64 with gcc execution

Extra Info

Here i am expecting better results with iwam sysbench.aot commnad, but i am getting less results that why i would like know the solution to get better result in iwasm. the same thing i am getting in threads workload also.

please verify for threads workload also.

these are the commands for iwasm with wasi-sdk23

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench threads --threads=8 --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time

Initializing worker threads...

Threads started!

Throughput:
    events/s (eps):                      454.7764
    time elapsed:                        3.0125s
    total number of events:              1370

Latency (ms):
         min:                                    2.80
         avg:                                   17.54
         max:                                  188.05
         95th percentile:                       70.55
         sum:                                24025.06

Threads fairness:
    events (avg/stddev):           171.2500/13.71
    execution time (avg/stddev):   3.0031/0.00

for aarch64 with gcc

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench threads --threads=8 --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time

Initializing worker threads...

Threads started!

Throughput:
    events/s (eps):                      454.7764
    time elapsed:                        3.0125s
    total number of events:              1370

Latency (ms):
         min:                                    2.80
         avg:                                   17.54
         max:                                  188.05
         95th percentile:                       70.55
         sum:                                24025.06

Threads fairness:
    events (avg/stddev):           171.2500/13.71
    execution time (avg/stddev):   3.0031/0.00
TianlongLiang commented 3 months ago

I think the threads benchmark workload results you posted are duplicated

Can you also tell me the command you used to compile aot files? For example, from that command, I can know whether you are using software boundary checks(it will result in performance loss when there are massive IO)

You can also refer to this document to see whether there is any helpful information you could use to analyze the performance gap further

subhakr commented 3 months ago

CFLAGS="-O3 -funroll-loops --sysroot=/home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot -pthread -fexceptions -D_WASI_EMULATED_PROCESS_CLOCKS -matomics -mbulk-memory"

LDFLAGS="--sysroot=/home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot -pthread -fexceptions -Wl,--shared-memory -g -lwasi-emulated-mman -Wl,--export-all -Wl,--no-entry -Wl,--export=__heap_base -Wl,--export=__data_end -pthread -lwasi-emulated-process-clocks -Wl,--initial-memory=2147483648 -Wl,--max-memory=2147483648"

make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" then it will generate sysbench wasm module. after that i have converted that sysbench wasm module into .aot by using below command with help of wamrc. /home/admin1/Public/wasm-micro-runtime/wamr-compiler/build/wamrc --enable-multi-thread -o sysbench.aot src/sysbench

then i am running sysbench workloads. and the result of sysbench thread workload with WASM compilation.

Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time

Initializing worker threads...

Threads started!

Throughput:
    events/s (eps):                      315.7304
    time elapsed:                        3.0342s
    total number of events:              958

Latency (ms):
         min:                                    2.86
         avg:                                   25.18
         max:                                  407.28
         95th percentile:                      176.73
         sum:                                24124.87

Threads fairness:
    events (avg/stddev):           119.7500/20.50
    execution time (avg/stddev):   3.0156/0.01
subhakr commented 3 months ago

and one more thing for cpu workload also i am getting drastic change results in sysbench wasm module like below

admin1@admin1-VivoBook-ASUSLaptop-X515EA-P1511CEA:~/sysbench_main$ /home/admin1/Documents/wasm-micro-runtime/product-mini/platforms/linux/build/iwasm sysbench.aot cpu --cpu-max-prime=20000 --time=3 run
Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second: 773738.73

Throughput:
    events/s (eps):                      773738.7288
    time elapsed:                        3.0002s
    total number of events:              2321354

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.05
         95th percentile:                        0.00
         sum:                                  688.21

Threads fairness:
    events (avg/stddev):           2321354.0000/0.00
    execution time (avg/stddev):   0.6882/0.00

In gcc compilation of sysbench i am getting below results.

admin1@admin1-VivoBook-ASUSLaptop-X515EA-P1511CEA:~/sysbench_main$ sysbench cpu --cpu-max-prime=20000 --time=3 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1340.51

General statistics:
    total time:                          3.0005s
    total number of events:              4024

Latency (ms):
         min:                                    0.73
         avg:                                    0.75
         max:                                    1.61
         95th percentile:                        0.78
         sum:                                 2999.55

Threads fairness:
    events (avg/stddev):           4024.0000/0.00
    execution time (avg/stddev):   2.9996/0.00

why i am getting the result with this much difference in wasm module

TianlongLiang commented 2 months ago

Can you share more details on how to compile sysbench to wasm? I try your command in root directory of sysbench and luajit report error for not supporting wasm architecture:

# command
make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang

error:

Making all in third_party/luajit
make[1]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make -C ./luajit clean
make[2]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make -C src clean
make[3]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
Makefile:271: *** Unsupported target architecture.  Stop.
make[3]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
make[2]: *** [Makefile:166: clean] Error 2
make[2]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make[1]: *** [Makefile:501: lib/libluajit-5.1.a] Error 2
make[1]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make: *** [Makefile:478: all-recursive] Error 1

tl in 🌐 TL-Work-PC in sysbench on  master [?] 
❯ make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang
Making all in third_party/luajit
make[1]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make -C ./luajit clean
make[2]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make -C src clean
make[3]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
Makefile:271: *** Unsupported target architecture.  Stop.
make[3]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
make[2]: *** [Makefile:166: clean] Error 2
make[2]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make[1]: *** [Makefile:501: lib/libluajit-5.1.a] Error 2
make[1]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make: *** [Makefile:478: all-recursive] Error 1
subhakr commented 2 months ago

yeah, for wasm luajit is not supporting thats why i have used lua5.3 version. and that too we have to convert those lua and concurrecy kit dir files into either .bc or .wasm formate. /home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot/lib in above dir i have placed thos two libraries like libck.a and liblua.a

i will share my sysbench src for your reference. sysbench_main.zip

if you want to try with my sysbench src please change the makefile paths a/c. and use this libarries for liblua and libck . i am using wasi-sdk23 version. please place this files in mentioned path as above. libraries.zip

NOTE: In libraries dir i have liblua.a_bk file which include setjmp and longjmp functions. In liblua.a i have commented becoz its not supporting while running. If its possible to resolve can you try it to resolve issue.

subhakr commented 2 months ago

Hi, any update from your side..

Thanks in advance.

subhakr commented 2 months ago

Could you please tell me if any informantion is there for above issues.

TianlongLiang commented 2 months ago

Sorry that I caught busy with a few other things last week, I will investigate it now

subhakr commented 2 months ago

ok thank you.

subhakr commented 2 months ago

Hi, @TianlongLiang Is there any info about above issues.

thanks in Advance.

TianlongLiang commented 2 months ago

I still can't compile your sysbench-main to wasm, I just use the commands that I would compile normal sysbench:

./autogen.sh
# Add --with-pgsql to build with PostgreSQL support
./configure
make -j
CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang

And it still emit bunch of errors:

In file included from /home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/ck_spinlock.h:33:
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:65:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   65 |         ck_pr_fence_lock();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:75:2: error: call to undeclared function 'ck_pr_fence_acquire'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   75 |         ck_pr_fence_acquire();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:99:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   99 |         ck_pr_fence_lock();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:118:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
  118 |         ck_pr_fence_lock();
      |         ^

I don't know whether there are some more configurations I need to modify other than wasi-sdk and user directory. I also don't know why it still compiles the third_party libraries.

Could you please provide more details on how to compile it using the existing library you sent me and the commands for compiling the wasm version sysbench? It seems that the simple make won't do it.

subhakr commented 2 months ago

Sorry for late reply, I will look into it once then i will tell you clearly, how i have integrated. thank you in advance.