casey / intermodal

A command-line utility for BitTorrent torrent file creation, verification, and more
https://imdl.io
Creative Commons Zero v1.0 Universal
487 stars 25 forks source link

Mostly fixed: Slow hashing with binaries targeting x86_64-unknown-linux-musl #440

Open quietvoid opened 4 years ago

quietvoid commented 4 years ago

Is there a reason current releases are built against x86_64-unknown-linux-musl ? This seems to cause a large performance decrease regardless of the I/O speed on the device the hashing is taking place.

It seems to affect both binaries in the AUR as well as the ones from the one-line install bash script.

On building master to stable-x86_64-unknown-linux-gnu, hashing speed is around 119 MiB/s. Using the release binaries, the speed is limited to 68.12 MiB/s.

There are known performance regressions when targeting musl for some reason, I'm just wondering if there is a good reason behind it.

Thanks.

casey commented 4 years ago

Ahh, sorry about the performance issues, and thank you for creating this issue!

The releases target x86_64-unknown-linux-musl because of compatibility issues I had with stable-x86_64-unknown-linux-gnu when distributing another binary. I don't exactly remember the details, but I seem to recall that sometimes users were having issues with non-statically linked binaries.

Do you have links/info about musl target performance regressions? I'm curious why it's slow, and if it's something that's going to be fixed soon.

I'm thinking that at the very least, I should build and publish stable-x86_64-unknown-linux-gnu binaries.

It might also be a good idea to make stable-x86_64-unknown-linux-gnu binaries the default. I wish I remembered what kind of issues I encountered when distributing non-statically linked binaries before 😅

quietvoid commented 4 years ago

There are issues with distributing binaries with glibc, such as different versions across distributions. For example, I couldn't reuse a build made on Arch on a Debian 10 machine.

So in a part x86_64-unknown-linux-musl binaries are static and work mostly everywhere, from what I understand.

Just googling a little (I had similar issues with other projects), this comes out: https://www.reddit.com/r/rust/comments/gdycv8/why_does_musl_make_my_code_so_slow/ The comments mention regressions in ripgrep as well, it's a good read.

I'm also trying to benchmark the hash function to see if it can be improved. 120 MB/s on an SSD feels definitely slow.

casey commented 4 years ago

Whoops! I fat-fingered the close button.

casey commented 4 years ago

Thanks for the links! I read through those threads, and it looks like musl performance issues fall into two main categories:

I'm not using threads, nor, I think, doing a whole lot of heap allocation during hashing, but it might be worth seeing if using jemalloc helps.

My hashing inner loop is pretty terrible, it goes a byte at a time because I was too lazy to think about piece boundaries, so that could be greatly improved. Also, I only buffer piece size bytes at a time when reading from a file. Increasing the amount of data that's buffered would result in fewer I/O calls, although could conceivably reduce parallelism.

All in all, I think it's probably a good idea to try to tackle the performance issues holistically, and see if that resolves some of the discrepancy between musl and glibc. If it doesn't, I can start building and distributing glibc binaries, although I'd rather hold off on that, since it might introduce portability issues.

I fleshed out #26, an issue for performance improvements, with some concrete ideas and suggestions. I'm not sure how much time I'll have over the next week or so, but I would like to improve this. I think that hashing performance is very important for a torrent creator.

quietvoid commented 4 years ago

Ah, I hadn't seen #26.

I do think the byte-per-byte loop is the issue for performance. I'll comment more on the issue itself.

casey commented 4 years ago

Ah, I hadn't seen #26.

No worries, I think performance degradation w/musl merits its own issue.

I do think the byte-per-byte loop is the issue for performance. I'll comment more on the issue itself.

Thanks!

quietvoid commented 4 years ago

@casey I guess 0.1.8 fixes this? The performance difference is much lower. Unless tests are required first.

casey commented 4 years ago

I think this is probably, although let's leave it open for now. I'm going to benchmark and profile the differences between glibc and musl, both because I'm curious where the slowdown is coming from and how the performance fix impacted it, and because I'd like to know how much slower musl is, so that I can track future regressions.

GottZ commented 10 months ago

image roughy 200 MiB/s in my case. it's not even peaking above 50% of drive performance.

I don't know if it's related to this or #26 but it's certainly an issue.