arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.28k stars 997 forks source link

Advice needed for creating a high speed config #4097

Closed fusk-l closed 4 years ago

fusk-l commented 5 years ago

I look up basically every setting in an attempt to understand what they do before changing something i might've read from a guide somewhere. Most i understand and there's those i don't understand very well, while it can also be difficult keeping track of how well settings play with each other, like disk settings etc. And it's not easy to find info on google as discussions mostly center around specific settings and not the entire config.

I understand the "high performance seed" profile is (primarily) for seeding only, there is no "high performance both ways" profile, so i figured i'd use the performance profile as a base to create a config that also performs well during download.

Libtorrent 1.2.2. Deluge 2.0.3. Windows 10.

Right now i'm running the "high performance seed" profile with these settings. Notice anything wrong with this?

        "active_checking": 1,
        "active_downloads": 3,
        "active_limit": 2000,
        "active_loaded_limit": 0,
        "active_seeds": 2000,
        "active_tracker_limit": 2000,
        "aio_threads": 16,
        "allow_multiple_connections_per_ip": true,
        "allow_partial_disk_writes": true,
        "allowed_enc_level": 3,
        "allowed_fast_set_size": 0,
        "announce_to_all_tiers": true,
        "announce_to_all_trackers": true,
        "auto_sequential": true,
        "cache_buffer_chunk_size": 128,
        "cache_expiry": 400,
        "cache_size": 65536,
        "cache_size_volatile": 256,
        "checking_mem_usage": 2048,
        "choking_algorithm": 0,
        "close_file_interval": 120,
        "coalesce_reads": true,
        "coalesce_writes": true,
        "connection_speed": 200,
        "connections_limit": 8000,
        "contiguous_recv_buffer": true,
        "disk_io_read_mode": 0,
        "disk_io_write_mode": 0,
        "download_rate_limit": 80000000,
        "enable_dht": false,
        "enable_incoming_tcp": true,
        "enable_incoming_utp": true,
        "enable_lsd": false,
        "enable_natpmp": false,
        "enable_outgoing_tcp": true,
        "enable_outgoing_utp": true,
        "enable_upnp": false,
        "file_pool_size": 500,
        "guided_read_cache": true,
        "half_open_limit": 50,
        "in_enc_policy": 1,
        "inactivity_timeout": 20,
        "listen_interfaces": "192.168.1.101:49997",
        "listen_queue_size": 3000,
        "low_prio_disk": false,
        "max_allowed_in_request_queue": 2000,
        "max_failcount": 3,
        "max_http_recv_buffer_size": 6291456,
        "max_out_request_queue": 1500,
        "max_queued_disk_bytes": 7340032,
        "max_rejects": 10,
        "mixed_mode_algorithm": 0,
        "out_enc_policy": 1,
        "peer_timeout": 20,
        "piece_extent_affinity": true,
        "predictive_piece_announce": 250,
        "prefer_rc4": true,
        "prefer_udp_trackers": true,
        "rate_limit_ip_overhead": false,
        "rate_limit_utp": false,
        "read_cache_line_size": 32,
        "recv_socket_buffer_size": 1048576,
        "request_timeout": 10,
        "seed_choking_algorithm": 1,
        "send_buffer_low_watermark": 1048576,
        "send_buffer_watermark": 3145728,
        "send_buffer_watermark_factor": 150,
        "send_socket_buffer_size": 1048576,
        "suggest_mode": 0,
        "torrent_connect_boost": 60,
        "tracker_backoff": 5,
        "unchoke_slots_limit": -1,
        "upload_rate_limit": 57000000,
        "use_disk_cache_pool": false,
        "use_disk_read_ahead": true,
        "use_parole_mode": true,
        "use_read_cache": true,
        "use_write_cache": true,
        "volatile_read_cache": false,
        "whole_pieces_threshold": 20,
        "write_cache_line_size": 256

Edit: Updated to reflect present settings.

Also, i'm uncertain about "piece_extent_affinity", i understand it's supposed to help with disk i/o, but at the same time can hurt performance. Does anyone have any experience with this setting?

arvidn commented 5 years ago

the piece_extent_affinity is a feature for downloaders meant to help with disk I/O on the seeders. It creates an affinity to request windows of contiguous bytes larger than the piece size (which is traditionally the largest contiguous range to request). This is especially helpful for torrents with small pieces.

The PR where this feature landed has some more references. Most notably, this.

arvidn commented 5 years ago

I would say it's risky to set the download rate limit at all. Just involving the rate limiter, even with a very high limits, adds the risk of it artificially limiting the rate because of internal inefficiencies.

Similarly, if you want unlimited unchoke slots, set it to unlimited rather than a high value. This saves one round-trip of "interested" -> "unchoke" messages when a new peer connects. If the the number of unchoke slots is unlimited, all peers are unchoked pre-emptively, whether they are interested or not.

I don't see why you would want to set "allowed_fast_set_size" to 0. Do you have any observations that the default is detrimental to performance?

Technically, setting "prefer_rc4" to false, and disabling encryption (or, obfuscation) is cheaper from a disk I/O point of view. If payload is sent in the clear, multiple peers can share the same buffers from the disk cache to send from, rather than having to make a copy just to encrypt it.

suggest_mode is a bit experimental. If you want to enable it, make sure you collect some solid data supporting it.

Setting max_failcount to 1 is unlikely to help, since you allow so many connections anyway, holding on to a peer a bit longer most likely doesn't hurt.

If disk IO is a bottleneck (which I would expect it to be given a fast internet connection), you probably want to set aio_threads to something much greater than 2. probably 16 or 32.

fusk-l commented 5 years ago

I would say it's risky to set the download rate limit at all. Just involving the rate limiter, even with a very high limits, adds the risk of it artificially limiting the rate because of internal inefficiencies.

Similarly, if you want unlimited unchoke slots, set it to unlimited rather than a high value. This saves one round-trip of "interested" -> "unchoke" messages when a new peer connects. If the the number of unchoke slots is unlimited, all peers are unchoked pre-emptively, whether they are interested or not.

Ok, that makes sense. The value is default from the "high performance seed" profile. Unlimited is -1 or 0 for "unchoke_slots_limit": ? I have a rate limit added just to make sure there's headroom for other connections elsewhere in the network, so just to prevent it from saturating it all.

I don't see why you would want to set "allowed_fast_set_size" to 0. Do you have any observations that the default is detrimental to performance?

I do not remember, i believe the 0 is to disable it, but not entirely sure why. But may be because the unchoke limit is high, so fast set wasn't needed. I've set it back to the default "5".

suggest_mode is a bit experimental. If you want to enable it, make sure you collect some solid data supporting it.

I read about it and thought it would be a good option, if it's experimental and not quite working as intended i could turn it off. I do not have solid data to support having either on or off.

Setting max_failcount to 1 is unlikely to help, since you allow so many connections anyway, holding on to a peer a bit longer most likely doesn't hurt.

You're suggesting disabling it, or default it to "3"?

If disk IO is a bottleneck (which I would expect it to be given a fast internet connection), you probably want to set aio_threads to something much greater than 2. probably 16 or 32.

If i understand that setting correctly, the value corresponds to engaged cpu threads. If you have a hyperthreaded 2 core cpu, then setting the value to 4 would engage all 4 threads. But i read somewhere that utilizing both threads on 1 core could mean it's fighting it self over resources, so the value should be physical cores. But may be that is not correct considering the values you put.

xavier2k6 commented 5 years ago

@fusk-l you could raise "checking_mem_usage" to "2048" https://github.com/arvidn/libtorrent/pull/3213 bumped it to be higher than the default in the high_performance_seed preset

@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation.

Could you elaborate more on this feature please? Would there be any performance gain for the average user if it was increased to the "2048" or is it specifically for high_performance_seed as it suggests...

I had asked for the "2048" limit to be exposed or raised over in qbittorrent tracker but it was suggested that the normal value hadn't changed & high_performance_seed wasn't used via qbittorrent.

https://github.com/qbittorrent/qBittorrent/issues/11280#issuecomment-546793883

fusk-l commented 5 years ago

@fusk-l you could raise "checking_mem_usage" to "2048"

3213 bumped it to be higher than the default in the high_performance_seed preset

If i'm not mistaken that option is only for memory usage while checking a torrent and the allocated memory is only in use when a torrent is being checked, otherwise the memory is used for normal operation. I do not check torrent very often. But if increasing it beyond 320 increases speed for the checking process, then i see no reason not to bump it.

arvidn commented 5 years ago

Ok, that makes sense. The value is default from the "high performance seed" profile. Unlimited is -1 or 0 for "unchoke_slots_limit": ?

-1 means unlimited. In fact, while I was looking at this, it turns out all the work of sorting peers happens even when the unchoke logic is disabled by setting this to infinite. Would you mind trying out this patch, which should save some CPU when using unlimited unchoke slots?

Also, come to think of it, if you unchoke all peers anyway, it does make sense to set allowed_fast_set_size to 0, as it won't have an effect anyway, and will just cost some unnecessary work.

I read about it and thought it would be a good option, if it's experimental and not quite working as intended i could turn it off. I do not have solid data to support having either on or off.

The main risk with it is that it decides to pull in a lot of data from disk into the cache, and then suggest all peers to come get it. If the peers don't support the suggest message, or if the cache gets evicted for some other reason before the peers request it, you may end up with worse performance. I'm not confident that such situations cannot happen, that's why I consider it experimental.

You're suggesting disabling it, or default it to "3"?

I think default is fine, unless you discover some case where it's a problem.

If i understand that setting correctly, the value corresponds to engaged cpu threads. If you have a hyperthreaded 2 core cpu, then setting the value to 4 would engage all 4 threads. But i read somewhere that utilizing both threads on 1 core could mean it's fighting it self over resources, so the value should be physical cores. But may be that is not correct considering the values you put.

Those are specifically disk I/O threads, so they will mostly either be idle (waiting for work) or suspended in a blocking disk I/O system call. Either way, they won't do a lot of actual computation. However, one in every 4 disk I/O thread is dedicated to perform the SHA-1 hashing of incoming data, so those can actually end up using CPU while downloading.

FranciscoPombal commented 5 years ago

@fusk-l If you have a really fast connection, tuning aio_threads should help a lot; setting it to one of- or maybe even either of 4*N_HARDWARE_THREADS or 4*N_PHYSICAL_CORES should yield the best performance.

fusk-l commented 5 years ago

Would you mind trying out this patch, which should save some CPU when using unlimited unchoke slots?

I do not know how to apply the patch, i have not done that before.

Also, come to think of it, if you unchoke all peers anyway, it does make sense to set allowed_fast_set_size to 0, as it won't have an effect anyway, and will just cost some unnecessary work.

I believe that is why it was set to 0. 3000 is basically unlimited, at least on private trackers.

Those are specifically disk I/O threads, so they will mostly either be idle (waiting for work) or suspended in a blocking disk I/O system call. Either way, they won't do a lot of actual computation. However, one in every 4 disk I/O thread is dedicated to perform the SHA-1 hashing of incoming data, so those can actually end up using CPU while downloading.

I see, so that's why a value of 16 would be fitting for a 4 thread cpu? I've not really had I/O issues, that i know of but i changed the value and we'll see if i notice anything.

xavier2k6 commented 5 years ago

Just an FYI

Get CPU Information via Command Prompt (Windows)

Open command prompt. Type the following command: wmic cpu get name, numberofcores, numberoflogicalprocessors

Chocobo1 commented 5 years ago

@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation. ... qbittorrent/qBittorrent#11280 (comment)

Besides my previous reply, you neglected you are comparing different order of magnitudes here. In libtorrent, the value is specified in number of 16 KiB blocks while in qbt it is in MiB. So 1024 MiB from qbittorrent GUI is actually translated to 65536 for libtorrent.

arvidn commented 5 years ago

you could raise "checking_mem_usage" to "2048"

3213 bumped it to be higher than the default in the high_performance_seed preset

@arvidn would that be a good suggestion for him?

I'm not sure. That knob controls how many outstanding 16kB read operations are allowed to be in-flight at any given time during a full hash-check of a torrent. it depends on the throughtput and latency of the disk I would think. This is non-trivial to determine, as having a torrent full of tiny files will increase latency of reading them (most likely) and would warrant a larger value.

arvidn commented 5 years ago

I do not know how to apply the patch, i have not done that before.

If you're already building from source, all you need to do is:

git clone https://github.com/arvidn/libtorrent.git --branch optimize-unimited-unchoke-slots

I see, so that's why a value of 16 would be fitting for a 4 thread cpu? I've not really had I/O issues, that i know of but i changed the value and we'll see if i notice anything.

This value is (mostly) not related to the number of cores you have, it's related to how many outstanding disk I/O operations (reads and writes) you can throw at the disk and improve throughput.

This makes a difference even for SSDs with very low latency, because their throughput is so high that the bandwidth delay-product still ends up pretty high. If you're not filling that pipe, you're not getting the max throughput of the drive.

fusk-l commented 5 years ago

If you're already building from source, all you need to do is:

git clone https://github.com/arvidn/libtorrent.git --branch optimize-unimited-unchoke-slots

I'm not building from source, someone else is.

https://forum.deluge-torrent.org/viewtopic.php?f=12&t=55463&sid=200c1f9bfde4c809aa71ef24c6c5e168

xavier2k6 commented 5 years ago

@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation. ... qbittorrent/qBittorrent#11280 (comment)

Besides my previous reply, you neglected you are comparing different order of magnitudes here. In libtorrent, the value is specified in number of 16 KiB blocks while in qbt it is in MiB. So 1024 MiB from qbittorrent GUI is actually translated to 65536 for libtorrent.

Is that conversion right??

arvidn commented 5 years ago

Is that conversion right??

if the unit is MiB, 1024 is 1 GiB, which is 1024*1024 / 16 blocks, so yes, 65536

xavier2k6 commented 5 years ago

Is that conversion right??

if the unit is MiB, 1024 is 1 GiB, which is 1024*1024 / 16 blocks, so yes, 65536

Ah yes, apologies - a miscalculation on my part.....fully understand now.

I suppose users with 1,000's of torrents would see a performance increase in checking torrents if the limit was raised though wouldn't they......so there is basis for raising this limit no?! Especially when certain user machines have 16/32+ GB Ram available.

fusk-l commented 5 years ago

Shouldn't the default 3000 unchoke limit in the "high performance seed" profile be changed to -1 as default? I'm guessing "high performance seed" assumes you have a fast upload, 400/500mbit or above.

arvidn commented 5 years ago

Shouldn't the default 3000 unchoke limit in the "high performance seed" profile be changed to -1 as default?

yes, it probably should

fusk-l commented 5 years ago

Are there any settings we've forgotten that helps with disk writes? Is there any merit in changing the value for whole_pieces_threshold to increase disk performance for high speed downloading?

arvidn commented 5 years ago

not that I can think of off the top of my head. I think the best way is to enable stats alerts and log them to profile what the bottlenecks are.

fusk-l commented 5 years ago

not that I can think of off the top of my head. I think the best way is to enable stats alerts and log them to profile what the bottlenecks are.

The only bottleneck there is on occasion, is write I/O. But i know that's already been discussed to great lengths, so i'll just leave it at that.

xavier2k6 commented 5 years ago

Would "read_cache_line_size" & "write_cache_line_size" being increased -> 512/1024/2048 etc have an impact on a seeding only system & also with disk i/o queue job size when checking torrents/fastresume?

arvidn commented 4 years ago

those settings don't affect checking. Checking always reads sequentially and the read-ahead is controller by checking_mem_usage (or something like that).

The risk of increasing the cache line size too much, is that you start thrashing the cache. they control how many blocks to read in at a time, from disk into the cache. If the other peer isn't requesting blocks sequentially, reading too much is wasteful.

fusk-l commented 4 years ago

With 16gb ram, would it make sense to set "cache_size": 65536" to double or more? 65536 blocks = 1gb disappears very quickly at gbit speeds.

What are the values for "disk_io_read_mode / write", the reference page only has "enable_os_cache" or "disable_os_cache", but not their numeric values. Is -1 disabled or is it 0 for disabled?

"use_write_cache" is depreciated, i am guessing there is no point in having it on.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.