Open gzc1090 opened 4 months ago
Hi, I played around with adding the vector load/stores to the single instructions measurements, but I came to the conclusion that it would be more useful in a separate benchmark.
https://github.com/camel-cdr/rvv-bench/issues/12 has some measurements that show how different stride values perform. Ideally we'd measure something like that, with data from the different caches and from memory. I'm not sure how to properly do those measurements, though. This should probably also take into account different prefetch strategies.
For now you can look at the LUT4, and ascii to utf16/utf32, where indexed, strided and segmented loads are used in some of the implementations.
If you have suggestions please share them, I was planing to look at some memory measurements done on other ISAs, but I haven't gotten around to that yet.
Hi, I find your benchmark to be very valuable. Do you have any good ideas or suggestions for testing the performance (throughput or latency) of various vector load instructions? I would like to explore the vector load performance on the K1 and K230.
Thanks