anacrolix / torrent

Full-featured BitTorrent client package and utilities
Mozilla Public License 2.0
5.58k stars 630 forks source link

hign cpu usage of sha1.blockAMD64 #905

Closed kslr closed 9 months ago

kslr commented 9 months ago

hi, I'm looking for ways to increase download speeds, using deluge I can get ~150MB or so, but my headless client is only ~60M(cpu 300%+). With go pprof I've noticed excessive cpu usage and I'd like to know how to optimize that performance. Like tweaking sqlite parameters/tweaking client parameters/using bolt vs nmap?

basic info: ubuntu 22.04, 16GB memory, 1T nvme, 1G Network, cpu E5-2650(2.00GHz)

simple client code:

cfg := torrent.NewDefaultClientConfig()
cfg.Bep20 = "-TR2770-"
cfg.ExtendedHandshakeClientVersion = "transmission 2.77"
cfg.HTTPUserAgent = "Transmission/2.77"
cfg.EstablishedConnsPerTorrent = 200
cfg.HalfOpenConnsPerTorrent = 100
cfg.TorrentPeersHighWater = 2000
cfg.ListenPort = 0
cfg.MaxUnverifiedBytes = 128 << 20 // 128mb
cfg.DisableAggressiveUpload = true
cfg.DefaultStorage = storage.NewFileByInfoHash("~/Downloads")

test data: nyaa seeder top 150

pprof fle: cpu.pprof.zip

Thank you for your help

anacrolix commented 9 months ago

Thanks for the thorough information, I'll take a look soon.

anacrolix commented 9 months ago

Very interesting CPU profile. I take it your instance is really hauling ass (at least for the anacrolix/torrent implementation), there must be a lot of data going through it. I'm surprised to see hashing be such an issue. It could be possible to use a faster hash for the smart cache, it seems to account for about 60% of the SHA1 hashing overhead. The smartban hash can be anything that's cryptographic, or possibly that can accept a seed or be salted (it just needs to be unguessable by an attacker, it's not critical). I wonder if I should provide the ability to turn off the smart cache, or use a faster hash.

The other thing of note is a non-negligible scheduling overhead. It might take more than a CPU trace to determine if things are optimal there. But certainly the main download path blocker is anything under receiveChunk so it's best to optimize that. Since piece hashing doesn't block the download path, I think if the smart ban stuff is optimized you will see huge performance gains.

anacrolix commented 9 months ago

This looks promising

~/ags/torrent % go test -run @ -bench SmartBan
goos: darwin
goarch: arm64
pkg: github.com/anacrolix/torrent
BenchmarkSmartBanRecordBlock/xxhash-10            868774          1433 ns/op    11431.27 MB/s
BenchmarkSmartBanRecordBlock/sha1-10              172546          7025 ns/op    2332.19 MB/s
PASS
ok      github.com/anacrolix/torrent    2.588s
anacrolix commented 9 months ago

@kslr Please try with https://github.com/anacrolix/torrent/tree/issue-905. It should be about 3x faster.

anacrolix commented 9 months ago

For context it looks like I forgot to include the smart ban block recording in the primary downloading benchmark.

kslr commented 9 months ago

Very nice work, now has a crazy speed (~250MB) with a cpu ~200, which I feel is more than enough, mainly because of my weaker CPU.

Allowing the smart ban to be disabled I think would cause other problems and the boost from continuing to optimize HASH probably wouldn't be that great, consider keeping it simple.

anacrolix commented 9 months ago

Thank you! I will merge the performance boost to main and release.

anacrolix commented 9 months ago

Fixed in v1.54.1