Open zhongjuzhe opened 6 months ago
Hi @zhongjuzhe, if you click on "Example measurement code for vadd.vx" you can see an example of what code I use to measure throughput.
To use this repo your self you need to:
cd instructions/rvv
make
will build the executable, you can either execute it your self or run make run
to automatically execute it with run.sh
Since the linux has disabled user level performance counter access in later versions you need to re enable them:
sysctl
perf_user_access
, see this article-DENABLE_RDCYCLE_HACK
to the CFLAGS
in ./config.mk. This works by using perf_event_open to measure the cycle count, because this somehow also enables user level access to the performance counter.If you are on a more obscure platform you may need to modify ./nolibc.h
to work for it.
I'll try to update the README soon, and add a wiki page for instructions on different configurations.
Please tell me if you still run into problems.
Is is possible to run intructions/rvv in baremetal ?
I tried this following command: Clang -march=rv64gcv -O3 main.c
but failed to compile it with several undefined referenced:
undefined reference to 'bench_types'. ....
etc
Yes it is, you'll have to replace the rdcycle rd
instructions with csrr rd, mcycle
and implement memwrite and the proper entry to main in nolibc.h.
Your command doesn't work, because you also need to preprocess (with m4) and build main.S, just look at how the Makefile does it.
I'll add some examples this weekend, including one for running baremetal on the t1 rtl simulation. That should help.
I've updated the README, but didn't get to writing the wiki, because the new t1 image doesn't work as expected. I'll create it once that has been fixed.
For now, here is how I build the baremetal benchmark for it before. You probably need different compiler configuration and memwrite implementation, but this should be roughly what you need to modify for a baremetal system.
You should already have a linker configuration and entry point if you run on bare metal, so use those instead of the t1 specific ones here.
# config.mk
WARN=-Wall -Wextra -Wno-unused-function -Wno-unused-parameter
CC=clang
CFLAGS=--target=riscv32 -march=rv32gc_zve32f -mabi=ilp32 -mno-relax -static -mcmodel=medany -fvisibility=hidden -nostdlib -fno-builtin -ffreestanding -fno-PIC ${WARN} -T /t1.ld /t1_main.S -DCUSTOM_HOST -DREAD_MCYCLE
# t1_main.S
# from: https://github.com/chipsalliance/t1/blob/master/tests/t1_main.S
.globl _start
_start:
li a0, 0x2200 # VS&FS
csrs mstatus, a0
csrwi vcsr, 0
#csrwi mcounteren,7
li a0, -8
csrw mcountinhibit,a0
#csrr a0, mcycle
la sp, __stacktop
// no ra to save
call nolibc_start
// exit
li a0, 0x10000000
li a1, -1
sw a1, 4(a0)
csrwi 0x7cc, 0
.p2align 2
// t1.ld
// from https://github.com/chipsalliance/t1/blob/master/tests/t1.ld
OUTPUT_ARCH(riscv)
ENTRY(_start)
MEMORY {
SCALAR (RWX) : ORIGIN = 0x20000000, LENGTH = 512M /* put first to set it as default */
MMIO (RW) : ORIGIN = 0x00000000, LENGTH = 512M
DDR (RW) : ORIGIN = 0x40000000, LENGTH = 2048M
SRAM (RW) : ORIGIN = 0xc0000000, LENGTH = 4M /* TODO: read from config */
}
SECTIONS {
. = ORIGIN(SCALAR);
.text : { *(.text .text.*) }
. = ALIGN(0x1000);
.data : { *(.data .data.*) }
. = ALIGN(0x1000);
.sdata : { *(.sdata .sdata.*) }
. = ALIGN(0x1000);
.srodata : { *(.srodata .srodata.*) }
. = ALIGN(0x1000);
.bss : { *(.bss .bss.*) }
_end = .; PROVIDE (end = .);
. = ORIGIN(SRAM);
.vdata : { *(.vdata .vdata.*) } >SRAM
.vbss (TYPE = SHT_NOBITS) : { *(.vbss .vbss.*) } >SRAM
__stacktop = ORIGIN(SCALAR) + LENGTH(SCALAR); /* put stack on the top of SCALAR */
__heapbegin = ORIGIN(DDR); /* put heap on the begin of DDR */
}
// nolibc.h
...
#ifdef CUSTOM_HOST
#define IFHOSTED(...)
#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0
/* customize me */
// output to t1 uart
static void
memwrite(void const *ptr, size_t len) {
struct uartlite_regs {
unsigned int rx_fifo;
unsigned int tx_fifo;
unsigned int status;
unsigned int control;
};
volatile struct uartlite_regs *const ttyUL0 = (struct uartlite_regs *)0x10000000;
unsigned char *p = ptr;
while (len--) {
while (ttyUL0->status & (1<<3));
ttyUL0->tx_fifo = *p++;
}
}
// static size_t /* only needed for vector-utf/bench.c */
// memread(void *ptr, size_t len) { }
static void
exit(int x) { __asm volatile("unimp\n"); }
int main(void);
void nolibc_start(void) {
int x = main();
flush();
}
#elif __STDC_HOSTED__
...
Is it possible to disable FP16 vector testcase ?
Yes, they shouldn't be enabled default.
rvv/config.h should exclude them with the mask by default, but maybe I've missed something. Can you share after which instruction you get an illegal instruction/where the problem is?
Hi, I saw each RVV instruction throughput result here: https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html
If I want to test the execution throughput of each RVV instructions in other RISC-V board, could you give me guides ?
And I wonder whether how you measure the execution throughput ?
Thanks,