KokaKiwi / rust-hex

A basic crate to encode values to hexadecimal representation. Originally extracted from rustc-serialize.
https://crates.io/crates/hex
Apache License 2.0
203 stars 57 forks source link

hex::encode performance #18

Open vladignatyev opened 5 years ago

vladignatyev commented 5 years ago

Hi,

My code heavily relies on hex::encode from your crate. I'm not sure that performance of hex::encode is a bottleneck in my project, but for some of my methods a single call to hex::encode takes about 1/4 of overall time the method takes. It's too long for my crypto-related stuff, yet hex is a very nice crate.

I started an investigation and added a bench for hex::encode in my fork. What's interesting is the fact that the performance of hex::encode depends on a length of input non-lineary.

Here is an output of cargo bench for information.

running 6 tests
test tests::bench_encode_512bits ... bench:         962 ns/iter (+/- 139)
test tests::bench_encode_256bits ... bench:         518 ns/iter (+/- 61)
test tests::bench_encode_128bits ... bench:         287 ns/iter (+/- 42)
test tests::bench_encode_64bits  ... bench:         163 ns/iter (+/- 93)
test tests::bench_encode_32bits  ... bench:         123 ns/iter (+/- 15)
test tests::bench_encode_16bits  ... bench:          95 ns/iter (+/- 63)

What I'm going to do is to try to improve the performance of this method. Please vote up / comment in a case my investigation and pull-requests are appreciated here.

Thanks!

vladignatyev commented 5 years ago

But it looks like it is impossible to do better 🐶

LukasKalbertodt commented 5 years ago

Hi there! Just a few quick comments from my side.

The non-linearity you see is probably caused by two things: constant measuring overhead and more importantly memory allocation. hex::encode allocates a String each time you call it. This has some significant overhead (a linear part that scales with the size of the allocation, but also a significant constant part). I am fairly sure that the relationship between input size and time get's pretty close to linear for larger inputs (16KB or so).

About improving performance: for larger inputs, you will be able to improve performance a whole lot by using SIMD instructions. A quick search brought up this crate: faster-hex which implements exactly that. I doubt that it will be faster for tiny inputs, though.

Also, related: #14

KokaKiwi commented 5 years ago

Hi, just sayin' that with last commit b86f391835bf6984b472ebad25a5282a5cc7e0a9 I managed to double performance according to benchmarks actually implemented:

test bench::a_bench ... bench:      31,055 ns/iter (+/- 11,988) = 412 MB/s

Compared to previous:

test bench::a_bench ... bench:      52,996 ns/iter (+/- 5,083) = 224 MB/s
taiki-e commented 3 years ago

FYI: #62 and #64 will greatly (7-10x) improve the performance of both encoding and decoding.