anacrolix / torrent

Full-featured BitTorrent client package and utilities
Mozilla Public License 2.0
5.39k stars 615 forks source link

Slow downloads while multiple large torrents are being downloaded. #897

Closed veshij closed 4 months ago

veshij commented 5 months ago

We are observing extremely slow download speeds when multiple large torrents are being downloaded. We embed library in our application with following configuration:

    config := torrent.NewDefaultClientConfig()
    config.DhtStartingNodes = func(string) dht.StartingNodesGetter { return func() ([]dht.Addr, error) { return nil, nil } }
    config.UploadRateLimiter = rate.NewLimiter(32*1024*1024, 64*1024*1024)
    config.DownloadRateLimiter = rate.NewLimiter(128*1024*1024, 256*1024*1024)
    config.DialRateLimiter = rate.NewLimiter(128, 256)
    config.EstablishedConnsPerTorrent = 128
    config.DisableUTP = true
    config.NoDHT = true
    config.Seed = true
    config.DisableWebseeds = true
    config.ListenPort = 5502

All torrents have a piece size 256k. Some torrents distribute 5-6GB datafiles, some are fairly small, hundreds of MB.

We observe that when "large" torrent is being downloaded - this torrent and other torrents processed by the same process are downloaded way below DL/UL rate-limit speeds. Once large torrent is completed - "smaller" torrents are processed with regular/expected speeds.

Looks like most of the CPU time is spent in request-strategy.GetRequestablePieces:

Screenshot 2024-02-02 at 10 29 51 AM

pprof.gz

Could you suggest what we can change to resolve the issue? Thank you!

anacrolix commented 5 months ago

Thanks for the report. What version of the client are you using?

Some workarounds include: Limiting the number of pieces per torrent. There is an algorithm that does this include in this implementation. Another is to add a capacity to your storage. I think this will short circuit how deep the implementation consists potential requests. See the storage package interfaces for this.

Feel free to share a list of torrents that make the issue reproducible.

veshij commented 5 months ago

What version of the client are you using?

We use v1.53.2.

Limiting the number of pieces per torrent.

Yeah, looks like code does a lookup per each piece per each torrent, so it could help a bit.

Another is to add a capacity to your storage.

What do you mean by capacity, I/O?

Feel free to share a list of torrents that make the issue reproducible.

Unfortunately I can't share list of torrents. These single-file blobs of variable size, from 500MB to 5GB, all of them have 256k torrent block (which can't be easily changed).

veshij commented 5 months ago

Would it make sense to store pointer to original torrent.Torrent in PieceRequestOrderKey/PieceRequestOrderState so we can avoid map lookups in requestStrategyInput.Torrent()?

So if we have 50-100 active torrents, some of them have 10-30k pieces this Scan becomes very expensive. 1M map accesses per each call to PeerConn.maybeUpdateActualRequestState(), am I reading it right?

anacrolix commented 5 months ago

What version of the client are you using?

We use v1.53.2.

There are some critical updates in v1.53.3 that are worth getting. In particular a1105a055cf4138deba564535eb34995fa014a0b might help with your current performance issue.

Another is to add a capacity to your storage.

What do you mean by capacity, I/O?

This is the part of the code corresponding to https://github.com/anacrolix/torrent/blob/918e62852b4184cb360f13f69749708231831378/request-strategy/order.go#L51-L56. This value is derived from Capacity in https://pkg.go.dev/github.com/anacrolix/torrent@v1.53.3/storage#TorrentImpl. If your storage has a limited capacity, the client will avoid overwriting higher priority pieces with lower priority pieces allowing you to stream inside a limited buffer. It uses this to short circuit requests too. There is another related value, MaxUnverifiedBytes I'll mention below.

Feel free to share a list of torrents that make the issue reproducible.

Unfortunately I can't share list of torrents. These single-file blobs of variable size, from 500MB to 5GB, all of them have 256k torrent block (which can't be easily changed).

The algorithm is https://pkg.go.dev/github.com/anacrolix/torrent@v1.53.3/metainfo#ChoosePieceLength, it's used in a few places inside anacrolix/torrent by default, but you should try to use it in your own metainfo generation if you can. It is deterministic so once you switch over you can continue to get the benefits.

anacrolix commented 5 months ago

~Would it make sense to store pointer to original torrent.Torrent in PieceRequestOrderKey/PieceRequestOrderState so we can avoid map lookups in requestStrategyInput.Torrent()?~

So if we have 50-100 active torrents, some of them have 10-30k pieces this Scan becomes very expensive. 1M map accesses per each call to PeerConn.maybeUpdateActualRequestState(), am I reading it right?

Yes I think so. It's interesting to see how large the mapaccess cost is to the index from infohash to Torrent. I think the profile is showing that you have an insanely large number of pieces, because the map access should be a negligible cost here. Even the other more expensive lookups are smaller at this point. I noticed that MaxUnverifiedBytes isn't used to short circuit the piece generation, maybe try with this patch (in addition to v1.53.3 for the extra change above): https://github.com/anacrolix/torrent/compare/master...issue-897

It may cause some torrents to have to wait for others to clear the unverified bytes buffer, but you should see better CPU performance at least. I will make it the default based on your experience.

veshij commented 5 months ago

Thank you, I'll try patches right now. What'd be reasonable value for MaxUnverifiedBytes to start with? upd: looks like I'm using default MaxUnverifiedBytes: 64 << 20

p.s. I tried patching implementation to use per-torrent, but not global requestPieceOrder and seeing improved performance, but it's still below expectation/network rate-limit. Feels like avoiding map lookup during Scan() for finding torrent corresponding to piece might be a good optimization.

anacrolix commented 5 months ago

Thank you, I'll try patches right now. What'd be reasonable value for MaxUnverifiedBytes to start with? upd: looks like I'm using default MaxUnverifiedBytes: 64 << 20

p.s. I tried patching implementation to use per-torrent, but not global requestPieceOrder and seeing improved performance, but it's still below expectation/network rate-limit. Feels like avoiding map lookup during Scan() for finding torrent corresponding to piece might be a good optimization.

Avoiding the map lookup might only be a 20-30% improvement. Doing piece ordering specific to each torrent should give you an O(|torrents|) speedup, which sounds like it might be decent for your use case. That might be possible to do automatically if there is no shared storage capacity for your torrents.

What storage are you using? Did you get any performance changes with the 2 commits I recommended to use above?

veshij commented 5 months ago

Can't say I see a significant difference with these 2 patches.

anacrolix commented 5 months ago

I still need your storage implementation and config. It looks like it should be fairly easy to remove that map lookup, but as I mentioned, I'm not sure it will provide a huge win.

veshij commented 5 months ago

I do not specify explicit storage in torrent.Config. Each torrent is created via following code:

    torrentSpec, err := torrent.TorrentSpecFromMetaInfoErr(torrentMetaInfo)
    if err != nil {...}
    torrentSpec.Storage = newLimitedClient(storage.NewFile(downloadDir), fetcher.concurrencyLimiter)
    addedTorrent, isNew, err := torrentClient.AddTorrentSpec(torrentSpec)
       ...

where newLimitedClient limits parallel IO requests via semaphore (to limit side-effects on spinning disks). Not sure it's 100% required, but based on metrics it's not the bottleneck (I want to triple-check that though). After the patches and some other changes I made (getting rid of Torrent lookup + per-torrent piece-request-order structs) performance seems to be better, but I'll have better answer tomorrow when this system is more actively used. I'll try to isolate performance-changing changes tomorrow.

veshij commented 5 months ago

Looks like there is an improvement, but perf is still relatively low, especially when multiple torrents are downloaded. When downloading two torrent files in parallel, one with 10k pieces, second with 30k pieces I'm getting roughly ~2300 pieces/min, ~10MB/s. 30% of time (under global lock) is spent in torrent.requestStrategyTorrent.IgnorePiece (the code running includes "fix" for avoiding Torrent map lookup).

Screenshot 2024-02-07 at 10 51 08 AM
veshij commented 5 months ago

Should we return false here to stop tree traversal? https://github.com/anacrolix/torrent/blob/877c4affcd72050c9b6023dd4df2fef9d62d2f47/request-strategy/order.go#L82-L84

https://github.com/anacrolix/torrent/pull/901 is that's the case.

veshij commented 5 months ago

I added a bit of logging, it looks like Scan loop never finishes based on MaxUnverifiedBytes condition, in most cases in each iteration it processes all pieces and ends up with allTorrentsUnverifiedBytes equal to just a few pieces, sometimes none pieces is total number of items processed in Scan() MaxUnverifiedBytes is input.MaxUnverifiedBytes() allTorrentsUnverifiedBytes is allTorrentsUnverifiedBytes after Scan finishes

  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 941.912µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 733.896µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 964.139µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 931.727µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 995.079µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 707.152µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 707.072µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 754.102µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1835008

upd: actually in case of "fast" downloads I see reasonable numbers there, often maxing out MaxUnverifiedBytes and ignoring not too many pieces.

fast:

8 01:37:09.958265 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 24.75% (2599/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
I0208 01:37:12.958715 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 37.36% (3935/10618 pieces), peer count: 11 total/11 active/0 pending/0 half open, connected seed count: 11
DEBUG: processed 8049 pieces in 882.251µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7984
DEBUG: processed 7999 pieces in 897.7µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7934
DEBUG: processed 7625 pieces in 850.502µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7560
DEBUG: processed 7143 pieces in 803.434µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7078
I0208 01:37:15.959585 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 49.87% (5258/10618 pieces), peer count: 11 total/11 active/0 pending/0 half open, connected seed count: 11
DEBUG: processed 6186 pieces in 784.117µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 6121
DEBUG: processed 5907 pieces in 664.852µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5842
I0208 01:37:18.957390 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 62.54% (6601/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
DEBUG: processed 5167 pieces in 751.535µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5102
DEBUG: processed 5097 pieces in 582.187µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5032
DEBUG: processed 5954 pieces in 660.525µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5889
I0208 01:37:21.957372 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 76.20% (8058/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
DEBUG: processed 9091 pieces in 1.006716ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 9026
I0208 01:37:24.957483 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 89.80% (9504/10618 pieces), peer count: 9 total/9 active/0 pending/0 half open, connected seed count: 9
DEBUG: processed 9950 pieces in 1.055328ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 9885
DEBUG: processed 10618 pieces in 802.863µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 6815744; ignored 10592
I0208 01:37:27.957792 04959 torrent_fetcher.go:183] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] torrent downloaded.

slow:

I0208 01:47:10.011510 98844 torrent_fetcher.go:189] [2a730fe718b7bacdf7f9809a9bf39584d5e03a18-49f494c7ad130b7525e457e079cc126e] downloading: 88.26% (9369/10618 pieces), peer count: 121 total/121 active/0 pending/0 half open, connected seed count: 93
DEBUG: processed 10618 pieces in 913.435µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 901.311µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0; ignored 10618
DEBUG: processed 10618 pieces in 694.087µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 679.059µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 624.266µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0; ignored 10618
DEBUG: processed 10618 pieces in 686.554µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 917.933µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 936.909µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 928.002µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 932.441µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
DEBUG: processed 10618 pieces in 1.011431ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 716.872µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 671.746µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 930.827µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 895.801µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 674.781µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 676.084µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 928.193µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 1.039143ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 674.49µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 691.813µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 726.279µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
DEBUG: processed 10618 pieces in 1.16916ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 822.241µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 719.486µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 955.565µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
I0208 01:47:13.012255 98844 torrent_fetcher.go:189] [2a730fe718b7bacdf7f9809a9bf39584d5e03a18-49f494c7ad130b7525e457e079cc126e] downloading: 88.81% (9425/10618 pieces), peer count: 119 total/119 active/0 pending/0 half open, connected seed count: 94

the notable difference is that in "fast" one there are way less peers

anacrolix commented 5 months ago

Should we return false here to stop tree traversal?

https://github.com/anacrolix/torrent/blob/877c4affcd72050c9b6023dd4df2fef9d62d2f47/request-strategy/order.go#L82-L84

901 is that's the case.

I mentioned that change in https://github.com/anacrolix/torrent/issues/897#issuecomment-1931141820.

anacrolix commented 5 months ago

I added a bit of logging, it looks like Scan loop never finishes based on MaxUnverifiedBytes condition, in most cases in each iteration it processes all pieces and ends up with allTorrentsUnverifiedBytes equal to just a few pieces, sometimes none pieces is total number of items processed in Scan() MaxUnverifiedBytes is input.MaxUnverifiedBytes() allTorrentsUnverifiedBytes is allTorrentsUnverifiedBytes after Scan finishes

  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 941.912µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 733.896µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 964.139µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 931.727µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288
  |   | 2024-02-08 01:25:40 | DEBUG: processed 10618 pieces in 995.079µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 707.152µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 707.072µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432
  |   | 2024-02-08 01:25:39 | DEBUG: processed 10618 pieces in 754.102µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1835008

upd: actually in case of "fast" downloads I see reasonable numbers there, often maxing out MaxUnverifiedBytes and ignoring not too many pieces.

fast:

8 01:37:09.958265 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 24.75% (2599/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
I0208 01:37:12.958715 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 37.36% (3935/10618 pieces), peer count: 11 total/11 active/0 pending/0 half open, connected seed count: 11
DEBUG: processed 8049 pieces in 882.251µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7984
DEBUG: processed 7999 pieces in 897.7µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7934
DEBUG: processed 7625 pieces in 850.502µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7560
DEBUG: processed 7143 pieces in 803.434µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 7078
I0208 01:37:15.959585 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 49.87% (5258/10618 pieces), peer count: 11 total/11 active/0 pending/0 half open, connected seed count: 11
DEBUG: processed 6186 pieces in 784.117µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 6121
DEBUG: processed 5907 pieces in 664.852µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5842
I0208 01:37:18.957390 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 62.54% (6601/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
DEBUG: processed 5167 pieces in 751.535µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5102
DEBUG: processed 5097 pieces in 582.187µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5032
DEBUG: processed 5954 pieces in 660.525µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 5889
I0208 01:37:21.957372 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 76.20% (8058/10618 pieces), peer count: 10 total/10 active/0 pending/0 half open, connected seed count: 10
DEBUG: processed 9091 pieces in 1.006716ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 9026
I0208 01:37:24.957483 04959 torrent_fetcher.go:189] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] downloading: 89.80% (9504/10618 pieces), peer count: 9 total/9 active/0 pending/0 half open, connected seed count: 9
DEBUG: processed 9950 pieces in 1.055328ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 17039360; ignored 9885
DEBUG: processed 10618 pieces in 802.863µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 6815744; ignored 10592
I0208 01:37:27.957792 04959 torrent_fetcher.go:183] [6231b11bf40d5861bea8a2bb6b1be9195ceba3b2-1c3ca765bccf392443778ccf9d2f0398] torrent downloaded.

slow:

I0208 01:47:10.011510 98844 torrent_fetcher.go:189] [2a730fe718b7bacdf7f9809a9bf39584d5e03a18-49f494c7ad130b7525e457e079cc126e] downloading: 88.26% (9369/10618 pieces), peer count: 121 total/121 active/0 pending/0 half open, connected seed count: 93
DEBUG: processed 10618 pieces in 913.435µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 901.311µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0; ignored 10618
DEBUG: processed 10618 pieces in 694.087µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 679.059µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 624.266µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 0; ignored 10618
DEBUG: processed 10618 pieces in 686.554µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 917.933µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 936.909µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 928.002µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 932.441µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
DEBUG: processed 10618 pieces in 1.011431ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 716.872µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 671.746µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 930.827µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 895.801µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 674.781µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 676.084µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 928.193µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 1.039143ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 1310720; ignored 10613
DEBUG: processed 10618 pieces in 674.49µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 691.813µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 726.279µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
DEBUG: processed 10618 pieces in 1.16916ms with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 786432; ignored 10615
DEBUG: processed 10618 pieces in 822.241µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 719.486µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 262144; ignored 10617
DEBUG: processed 10618 pieces in 955.565µs with MaxUnverifiedBytes 16777216 and allTorrentsUnverifiedBytes 524288; ignored 10616
I0208 01:47:13.012255 98844 torrent_fetcher.go:189] [2a730fe718b7bacdf7f9809a9bf39584d5e03a18-49f494c7ad130b7525e457e079cc126e] downloading: 88.81% (9425/10618 pieces), peer count: 119 total/119 active/0 pending/0 half open, connected seed count: 94

the notable difference is that in "fast" one there are way less peers

I don't see much helpful in the data, but your final observation is potentially very useful. If the max unverified bytes is maxxed out, the implementation will struggle to assign requests to peers as there's less to choose from. This means there will be a lot of churn, and smaller request windows, which would feedback into more request updates (and CPU) spent.

anacrolix commented 5 months ago

Looks like there is an improvement, but perf is still relatively low, especially when multiple torrents are downloaded. When downloading two torrent files in parallel, one with 10k pieces, second with 30k pieces I'm getting roughly ~2300 pieces/min, ~10MB/s. 30% of time (under global lock) is spent in torrent.requestStrategyTorrent.IgnorePiece (the code running includes "fix" for avoiding Torrent map lookup). Screenshot 2024-02-07 at 10 51 08 AM

This is interesting. I wonder if I've neglected to filter the requestable pieces by whether they're wanted already, that extra check shouldn't be necessary. Perhaps it was added by accident by the contributed change that added IgnorePiece. I will check this, because removing that check will be a huge gain in performance.

veshij commented 5 months ago

Which exact check? I can apply local patch and see if there is difference.

veshij commented 5 months ago

Another perf-related question: In profiles I collected correctness check https://github.com/anacrolix/torrent/blob/c73c99b372114204b3e62ab7fe7103b2477c7dee/request-strategy/order.go#L60-L66 takes up to 10% cpu time. Would it make sense to remove it from runtime and add a unit-test for that instead (if not already)?

Screenshot 2024-02-07 at 11 25 12 PM
veshij commented 5 months ago

. If the max unverified bytes is maxxed out, the implementation will struggle to assign requests to peers as there's less to choose from.

I think I'm seeing the opposite:

anacrolix commented 5 months ago

Another perf-related question: It profiles I collected correctness check

https://github.com/anacrolix/torrent/blob/c73c99b372114204b3e62ab7fe7103b2477c7dee/request-strategy/order.go#L60-L66

takes up to 10% cpu time. Would it make sense to remove it from runtime and add a unit-test for that instead (if not already)?

Screenshot 2024-02-07 at 11 25 12 PM

I think I dabbled with a compile time or env var "check" flag that does extra pedantic stuff. I guess if it's never fired, it can be commented or if falsed out for now.

anacrolix commented 5 months ago

. If the max unverified bytes is maxxed out, the implementation will struggle to assign requests to peers as there's less to choose from.

I think I'm seeing the opposite:

  • in case of fast downloads I often see maxing out MaxUnverifiedBytes and relatively small amount of time spent in GetRequestablePieces. With MaxUnverifiedBytes change (return false) - even less cpu time spent there.
  • in case of slow - most of the blocks are skipped due to IgnorePiece, MaxUnverifiedBytes is never reached, full scans, lots of churn.

This fits in with the result of the checks I mentioned doing above https://github.com/anacrolix/torrent/issues/897#issuecomment-1933433110. I wanted to see if pieces could be elided from the piece request order if they aren't needed, however they're put there in order to ensure the storage capacity checks work. However if there is no storage capacity limit, it could be possible to not insert piece request order entries for pieces we don't want (they're completed, or have no priority assigned). This will only provide benefit for storage with no capacity (happens to be your case, and possibly most cases but not all). ~It also won't prevent iterating most or all pieces when the torrent is mostly incomplete.~ That would work because as soon as enough missing pieces are iterated over that the max unverified bytes ceiling is hit, the scan will return. That will happen much faster as pieces aren't ignored. I think this will work.

I really think I need some sample torrents so I can do some profiling and try a few things out. Is it possible to find a public torrent with lots of peers that reproduces the problem for you?

veshij commented 5 months ago

I think I dabbled with a compile time or env var "check" flag that does extra pedantic stuff. I guess if it's never fired, it can be commented or if falsed out for now.

removed in https://github.com/anacrolix/torrent/pull/902

anacrolix commented 5 months ago

@veshij please try with https://github.com/anacrolix/torrent/tree/issue-897. I have a benchmark, and generated a -97% time reduction for uncapped storage. The specific commit is https://github.com/anacrolix/torrent/commit/da477673eed6161ee6def7d100cf7c37f513f7dd.

veshij commented 4 months ago

Deployed from latest master, everything looks quite good! Download speeds are capped by rate-limiters, cpu time is low:

Screenshot 2024-02-09 at 2 29 19 PM

Thank you for your help!

anacrolix commented 4 months ago

Awesome! I'll make a release.