cessen / str_indices

Count and convert between various ways of indexing utf8 string slices.
Apache License 2.0
16 stars 5 forks source link

Add newlines, large benchmark, and throughput measures to benchmarks #14

Closed CeleritasCelery closed 1 year ago

CeleritasCelery commented 1 year ago

Added a throughput measurement to the criterion report. This will make it easier to see GB/s for a particular algorithm. I like looking at throughput because you can compare benchmarks regardless of size.

Also added a large benchmark that will test crossing chunk boundaries. These should be near the maximum throughput.

Lastly added the following line separators to the benchmarks text:

The benchmark names no longer exactly match their size. I am not sure if this a problem or not. Let me know what you think of the changes.

cessen commented 1 year ago

The benchmark names no longer exactly match their size. I am not sure if this a problem or not.

Definitely not a problem! As long as they're within a few percent, it's fine. Especially since you're listing the throughput now anyway.

Having said that, it occurs to me that this probably isn't really exercising the line functions very well, since the line breaks are so infrequent. I wonder if it might make more sense to leave the existing benchmark texts alone, and instead create new texts (probably just 100-ish bytes, which we can extend at run time with repeat()) with really frequent line breaks, and corresponding new benchmarks just for the line conversion functions.

CeleritasCelery commented 1 year ago

Let me know what you think of this change. I reverted the old text files and added a new one specifically for lines. It will now test each type of line ending independently.

cessen commented 1 year ago

Yes, this looks great! Thanks!