Open quietvoid opened 4 years ago
Ahh, sorry about the performance issues, and thank you for creating this issue!
The releases target x86_64-unknown-linux-musl
because of compatibility issues I had with stable-x86_64-unknown-linux-gnu
when distributing another binary. I don't exactly remember the details, but I seem to recall that sometimes users were having issues with non-statically linked binaries.
Do you have links/info about musl target performance regressions? I'm curious why it's slow, and if it's something that's going to be fixed soon.
I'm thinking that at the very least, I should build and publish stable-x86_64-unknown-linux-gnu
binaries.
It might also be a good idea to make stable-x86_64-unknown-linux-gnu
binaries the default. I wish I remembered what kind of issues I encountered when distributing non-statically linked binaries before 😅
There are issues with distributing binaries with glibc
, such as different versions across distributions.
For example, I couldn't reuse a build made on Arch on a Debian 10 machine.
So in a part x86_64-unknown-linux-musl
binaries are static and work mostly everywhere, from what I understand.
Just googling a little (I had similar issues with other projects), this comes out: https://www.reddit.com/r/rust/comments/gdycv8/why_does_musl_make_my_code_so_slow/
The comments mention regressions in ripgrep
as well, it's a good read.
I'm also trying to benchmark the hash function to see if it can be improved. 120 MB/s on an SSD feels definitely slow.
Whoops! I fat-fingered the close button.
Thanks for the links! I read through those threads, and it looks like musl performance issues fall into two main categories:
I'm not using threads, nor, I think, doing a whole lot of heap allocation during hashing, but it might be worth seeing if using jemalloc
helps.
My hashing inner loop is pretty terrible, it goes a byte at a time because I was too lazy to think about piece boundaries, so that could be greatly improved. Also, I only buffer piece size
bytes at a time when reading from a file. Increasing the amount of data that's buffered would result in fewer I/O calls, although could conceivably reduce parallelism.
All in all, I think it's probably a good idea to try to tackle the performance issues holistically, and see if that resolves some of the discrepancy between musl and glibc. If it doesn't, I can start building and distributing glibc binaries, although I'd rather hold off on that, since it might introduce portability issues.
I fleshed out #26, an issue for performance improvements, with some concrete ideas and suggestions. I'm not sure how much time I'll have over the next week or so, but I would like to improve this. I think that hashing performance is very important for a torrent creator.
Ah, I hadn't seen #26.
I do think the byte-per-byte loop is the issue for performance. I'll comment more on the issue itself.
Ah, I hadn't seen #26.
No worries, I think performance degradation w/musl merits its own issue.
I do think the byte-per-byte loop is the issue for performance. I'll comment more on the issue itself.
Thanks!
@casey I guess 0.1.8 fixes this? The performance difference is much lower. Unless tests are required first.
I think this is probably, although let's leave it open for now. I'm going to benchmark and profile the differences between glibc and musl, both because I'm curious where the slowdown is coming from and how the performance fix impacted it, and because I'd like to know how much slower musl is, so that I can track future regressions.
roughy 200 MiB/s in my case. it's not even peaking above 50% of drive performance.
I don't know if it's related to this or #26 but it's certainly an issue.
Is there a reason current releases are built against
x86_64-unknown-linux-musl
? This seems to cause a large performance decrease regardless of the I/O speed on the device the hashing is taking place.It seems to affect both binaries in the AUR as well as the ones from the one-line install bash script.
On building master to
stable-x86_64-unknown-linux-gnu
, hashing speed is around119 MiB/s
. Using the release binaries, the speed is limited to68.12 MiB/s
.There are known performance regressions when targeting musl for some reason, I'm just wondering if there is a good reason behind it.
Thanks.