camel-cdr / rvv-bench

A collection of RISC-V Vector (RVV) benchmarks to help developers write portably performant RVV code
MIT License
88 stars 12 forks source link

how to measure vector load performance #15

Open gzc1090 opened 4 months ago

gzc1090 commented 4 months ago

Hi, I find your benchmark to be very valuable. Do you have any good ideas or suggestions for testing the performance (throughput or latency) of various vector load instructions? I would like to explore the vector load performance on the K1 and K230.

Thanks

camel-cdr commented 4 months ago

Hi, I played around with adding the vector load/stores to the single instructions measurements, but I came to the conclusion that it would be more useful in a separate benchmark.

https://github.com/camel-cdr/rvv-bench/issues/12 has some measurements that show how different stride values perform. Ideally we'd measure something like that, with data from the different caches and from memory. I'm not sure how to properly do those measurements, though. This should probably also take into account different prefetch strategies.

For now you can look at the LUT4, and ascii to utf16/utf32, where indexed, strided and segmented loads are used in some of the implementations.

If you have suggestions please share them, I was planing to look at some memory measurements done on other ISAs, but I haven't gotten around to that yet.