Closed kslr closed 9 months ago
Thanks for the thorough information, I'll take a look soon.
Very interesting CPU profile. I take it your instance is really hauling ass (at least for the anacrolix/torrent implementation), there must be a lot of data going through it. I'm surprised to see hashing be such an issue. It could be possible to use a faster hash for the smart cache, it seems to account for about 60% of the SHA1 hashing overhead. The smartban hash can be anything that's cryptographic, or possibly that can accept a seed or be salted (it just needs to be unguessable by an attacker, it's not critical). I wonder if I should provide the ability to turn off the smart cache, or use a faster hash.
The other thing of note is a non-negligible scheduling overhead. It might take more than a CPU trace to determine if things are optimal there. But certainly the main download path blocker is anything under receiveChunk so it's best to optimize that. Since piece hashing doesn't block the download path, I think if the smart ban stuff is optimized you will see huge performance gains.
This looks promising
~/ags/torrent % go test -run @ -bench SmartBan
goos: darwin
goarch: arm64
pkg: github.com/anacrolix/torrent
BenchmarkSmartBanRecordBlock/xxhash-10 868774 1433 ns/op 11431.27 MB/s
BenchmarkSmartBanRecordBlock/sha1-10 172546 7025 ns/op 2332.19 MB/s
PASS
ok github.com/anacrolix/torrent 2.588s
@kslr Please try with https://github.com/anacrolix/torrent/tree/issue-905. It should be about 3x faster.
For context it looks like I forgot to include the smart ban block recording in the primary downloading benchmark.
Very nice work, now has a crazy speed (~250MB) with a cpu ~200, which I feel is more than enough, mainly because of my weaker CPU.
Allowing the smart ban to be disabled I think would cause other problems and the boost from continuing to optimize HASH probably wouldn't be that great, consider keeping it simple.
Thank you! I will merge the performance boost to main and release.
hi, I'm looking for ways to increase download speeds, using deluge I can get ~150MB or so, but my headless client is only ~60M(cpu 300%+). With go pprof I've noticed excessive cpu usage and I'd like to know how to optimize that performance. Like tweaking sqlite parameters/tweaking client parameters/using bolt vs nmap?
basic info: ubuntu 22.04, 16GB memory, 1T nvme, 1G Network, cpu E5-2650(2.00GHz)
simple client code:
test data: nyaa seeder top 150
pprof fle: cpu.pprof.zip
Thank you for your help