camel-cdr / rvv-bench

A collection of RISC-V Vector (RVV) benchmarks to help developers write portably performant RVV code
MIT License
86 stars 12 forks source link

Issue with 'illegal instruction' when using bench with spike #8

Open LMiaoH opened 11 months ago

LMiaoH commented 11 months ago

Hello: I am attempting to execute the bench on spike, and after running 'make all,' I encounter the following problem when attempting to execute the generated executable with spike:

li@h107:~/rvv-bench/bench$ ~/tools/riscv-isa-sim/build/spike --isa=rv64gcv1p0 -l --log-commits --log="memcpy.spike" `which pk` memcpy
bbl loader
z  0000000000000000 ra 0000000000000000 sp 0000003ffffffb40 gp 0000000000000000
tp 0000000000000000 t0 0000000000000000 t1 0000000000000000 t2 0000000000000000
s0 0000000000000000 s1 0000000000000000 a0 0000000000014048 a1 0000000000014000
a2 0000000000000000 a3 0000000000000000 a4 0000000000000000 a5 0000000000000000
a6 0000000000000000 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 000000000001134c va/inst 00000000c00025f3 sr 8000000200006620
An illegal instruction was executed!

The relevant portion in the log file is as follows:

   core   0: 0x000000000001134c (0xc00025f3) csrr    a1, cycle
   core   0: exception trap_illegal_instruction, epc 0x000000000001134c
   core   0:           tval 0x00000000c00025f3
   core   0: >>>>  trap_vector

What could be the cause of this issue, and do you have any suggestions for resolving it? By the way, I'm using the following version of the clang compiler:

li@h107:~/rvv-bench/bench$ clang -v
clang version 15.0.0 (https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi.git 142ea58f56d9622cc03d43e6ecffd9634d801546)
Target: riscv64-unknown-linux-gnu
Thread model: posix
camel-cdr commented 11 months ago

As far as I can tell this is a problem with spike or pk not exposing the cycle csr.

I'm not sure whats going on though, as pk enables it here:

https://github.com/riscv-software-src/riscv-pk/blob/710c23a5bbeecf171ac86d6e39d275af8f176354/machine/minit.c#L54-L58

If you remove the __asm statement in rv_cycle and remove _zfh_zba_zbb_zbs from config.mk or add them to the spike isa string, it runs for me.

LMiaoH commented 11 months ago

Thanks. Unfortunately, trying these suggestions doesn't seem to have any effect, and the same issue persists.

What should be my next move or consideration in this situation?

camel-cdr commented 11 months ago

Here is a Dockerfile that demonstrates what I was referring to:

FROM ubuntu:23.04

RUN apt-get update \
    && apt-get install -y build-essential wget git gcc-riscv64-linux-gnu clang device-tree-compiler lld \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN git clone --depth=1 https://github.com/riscv-software-src/riscv-isa-sim \
    && cd riscv-isa-sim \
    && ./configure \
    && make -j $(nproc) \
    && make install \
    && cd .. \
    && rm -rf riscv-isa-sim

RUN git clone --depth=1 https://github.com/riscv-software-src/riscv-pk \
    && mkdir riscv-pk/build \
    && cd riscv-pk/build \
    && ../configure --host=riscv64-linux-gnu --with-arch=rv64gcv --with-abi=lp64d \
    && make -j $(nproc) \
    && make install \
    && cd ../.. \
    && rm -rf riscv-pk

RUN git clone --recursive https://github.com/camel-cdr/rvv-bench \
    && cd rvv-bench \
    && sed -i 's/_zfh//g' config.mk \
    && cd bench \
    && sed -i 's/.*rdcycle.*//g' bench.h \
    && make

WORKDIR /rvv-bench/bench
RUN spike --isa=rv64gcv /usr/local/riscv64-linux-gnu/bin/pk ./memcpy

Since it removes the rdcycle code it won't report any reasonable timing results, but spike wouldn't have done that anyway, as simulators don't reflect what happens on actual hardware. IIRC qemu just passes through your host cycle counter, I'm not sure how spike is supposed to implement it.

I personally use qemu for testing, so I'd recommend you try that if you can. My personal setup uses the clang and qemu-user Debian pages.

What are you using this for specifically, maybe I can help better that way.

LMiaoH commented 11 months ago

Thanks. I successfully implemented your suggestions. However, I'm wondering how long these test programs might take to run on Spike. For the memcpy function, we've been running it for over 24 hours now, nd here are some snippets of the output:" :"

bbl loader
{
title: "memcpy",
labels: ["0","musl","scalar","scalar_autovec","rvv_m1","rvv_m2","rvv_m4","rvv_m8","rvv_align_dest_m1","rvv_align_dest_m2","rvv_align_dest_m4","rvv_align_dest_m8","rvv_align_src_m1","rvv_align_src_m2","rvv_align_src_m4","rvv_align_src_m8","rvv_align_dest_hybrid_m1","rvv_align_dest_hybrid_m2","rvv_align_dest_hybrid_m4","rvv_align_dest_hybrid_m8","rvv_tail_m1","rvv_tail_m2","rvv_tail_m4","rvv_tail_m8","rvv_128_m1","rvv_128_m2","rvv_128_m4","rvv_128_m8",],
data: [
[1,4,7,11,15,20,25,31,38,46,55,65,77,91,107,125,145,168,195,225,260,300,345,397,456,524,601,689,790,905,1037,1188,1360,1557,1782,2039,2333,2669,3053,3492,3993,4566,5221,5969,6824,7801,8918,10195,11654,13321,15227,17405,19894,22739,25990,29705,33951,38804,44350,50688,57932,66211,75672,86485,98843,112966,129107,147553,168635,192728,220263,251732,287696,328798,375772,429456,490809,560927,641062,732645,837311,956929,1093636,1249872,1428428,1632492,1865708,2132240,2436848,2784972,3182828,3637520,4157168,4751052,5429776,6205461,7091958,8105097,9262971,10586255,12098580,13826951,15802232,],
[18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,18446744073709551615.9551615,
camel-cdr commented 11 months ago

The default runtime is tuned for getting good measurements on real hardware, ypu can modify the bench/config.h file. Everything depends on MAX_MEM.