Closed fusk-l closed 4 years ago
the piece_extent_affinity
is a feature for downloaders meant to help with disk I/O on the seeders. It creates an affinity to request windows of contiguous bytes larger than the piece size (which is traditionally the largest contiguous range to request). This is especially helpful for torrents with small pieces.
The PR where this feature landed has some more references. Most notably, this.
I would say it's risky to set the download rate limit at all. Just involving the rate limiter, even with a very high limits, adds the risk of it artificially limiting the rate because of internal inefficiencies.
Similarly, if you want unlimited unchoke slots, set it to unlimited rather than a high value. This saves one round-trip of "interested" -> "unchoke" messages when a new peer connects. If the the number of unchoke slots is unlimited, all peers are unchoked pre-emptively, whether they are interested or not.
I don't see why you would want to set "allowed_fast_set_size" to 0. Do you have any observations that the default is detrimental to performance?
Technically, setting "prefer_rc4" to false, and disabling encryption (or, obfuscation) is cheaper from a disk I/O point of view. If payload is sent in the clear, multiple peers can share the same buffers from the disk cache to send from, rather than having to make a copy just to encrypt it.
suggest_mode
is a bit experimental. If you want to enable it, make sure you collect some solid data supporting it.
Setting max_failcount to 1 is unlikely to help, since you allow so many connections anyway, holding on to a peer a bit longer most likely doesn't hurt.
If disk IO is a bottleneck (which I would expect it to be given a fast internet connection), you probably want to set aio_threads
to something much greater than 2. probably 16 or 32.
I would say it's risky to set the download rate limit at all. Just involving the rate limiter, even with a very high limits, adds the risk of it artificially limiting the rate because of internal inefficiencies.
Similarly, if you want unlimited unchoke slots, set it to unlimited rather than a high value. This saves one round-trip of "interested" -> "unchoke" messages when a new peer connects. If the the number of unchoke slots is unlimited, all peers are unchoked pre-emptively, whether they are interested or not.
Ok, that makes sense. The value is default from the "high performance seed" profile. Unlimited is -1 or 0 for "unchoke_slots_limit": ? I have a rate limit added just to make sure there's headroom for other connections elsewhere in the network, so just to prevent it from saturating it all.
I don't see why you would want to set "allowed_fast_set_size" to 0. Do you have any observations that the default is detrimental to performance?
I do not remember, i believe the 0 is to disable it, but not entirely sure why. But may be because the unchoke limit is high, so fast set wasn't needed. I've set it back to the default "5".
suggest_mode
is a bit experimental. If you want to enable it, make sure you collect some solid data supporting it.I read about it and thought it would be a good option, if it's experimental and not quite working as intended i could turn it off. I do not have solid data to support having either on or off.
Setting max_failcount to 1 is unlikely to help, since you allow so many connections anyway, holding on to a peer a bit longer most likely doesn't hurt.
You're suggesting disabling it, or default it to "3"?
If disk IO is a bottleneck (which I would expect it to be given a fast internet connection), you probably want to set
aio_threads
to something much greater than 2. probably 16 or 32.If i understand that setting correctly, the value corresponds to engaged cpu threads. If you have a hyperthreaded 2 core cpu, then setting the value to 4 would engage all 4 threads. But i read somewhere that utilizing both threads on 1 core could mean it's fighting it self over resources, so the value should be physical cores. But may be that is not correct considering the values you put.
@fusk-l you could raise "checking_mem_usage" to "2048" https://github.com/arvidn/libtorrent/pull/3213 bumped it to be higher than the default in the high_performance_seed preset
@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation.
Could you elaborate more on this feature please? Would there be any performance gain for the average user if it was increased to the "2048" or is it specifically for high_performance_seed as it suggests...
I had asked for the "2048" limit to be exposed or raised over in qbittorrent tracker but it was suggested that the normal value hadn't changed & high_performance_seed wasn't used via qbittorrent.
https://github.com/qbittorrent/qBittorrent/issues/11280#issuecomment-546793883
@fusk-l you could raise "checking_mem_usage" to "2048"
3213 bumped it to be higher than the default in the high_performance_seed preset
If i'm not mistaken that option is only for memory usage while checking a torrent and the allocated memory is only in use when a torrent is being checked, otherwise the memory is used for normal operation. I do not check torrent very often. But if increasing it beyond 320 increases speed for the checking process, then i see no reason not to bump it.
Ok, that makes sense. The value is default from the "high performance seed" profile. Unlimited is -1 or 0 for "unchoke_slots_limit": ?
-1 means unlimited. In fact, while I was looking at this, it turns out all the work of sorting peers happens even when the unchoke logic is disabled by setting this to infinite. Would you mind trying out this patch, which should save some CPU when using unlimited unchoke slots?
Also, come to think of it, if you unchoke all peers anyway, it does make sense to set allowed_fast_set_size
to 0, as it won't have an effect anyway, and will just cost some unnecessary work.
I read about it and thought it would be a good option, if it's experimental and not quite working as intended i could turn it off. I do not have solid data to support having either on or off.
The main risk with it is that it decides to pull in a lot of data from disk into the cache, and then suggest all peers to come get it. If the peers don't support the suggest message, or if the cache gets evicted for some other reason before the peers request it, you may end up with worse performance. I'm not confident that such situations cannot happen, that's why I consider it experimental.
You're suggesting disabling it, or default it to "3"?
I think default is fine, unless you discover some case where it's a problem.
If i understand that setting correctly, the value corresponds to engaged cpu threads. If you have a hyperthreaded 2 core cpu, then setting the value to 4 would engage all 4 threads. But i read somewhere that utilizing both threads on 1 core could mean it's fighting it self over resources, so the value should be physical cores. But may be that is not correct considering the values you put.
Those are specifically disk I/O threads, so they will mostly either be idle (waiting for work) or suspended in a blocking disk I/O system call. Either way, they won't do a lot of actual computation. However, one in every 4 disk I/O thread is dedicated to perform the SHA-1 hashing of incoming data, so those can actually end up using CPU while downloading.
@fusk-l If you have a really fast connection, tuning aio_threads
should help a lot; setting it to one of- or maybe even either of 4*N_HARDWARE_THREADS or 4*N_PHYSICAL_CORES should yield the best performance.
Would you mind trying out this patch, which should save some CPU when using unlimited unchoke slots?
I do not know how to apply the patch, i have not done that before.
Also, come to think of it, if you unchoke all peers anyway, it does make sense to set
allowed_fast_set_size
to 0, as it won't have an effect anyway, and will just cost some unnecessary work.I believe that is why it was set to 0. 3000 is basically unlimited, at least on private trackers.
Those are specifically disk I/O threads, so they will mostly either be idle (waiting for work) or suspended in a blocking disk I/O system call. Either way, they won't do a lot of actual computation. However, one in every 4 disk I/O thread is dedicated to perform the SHA-1 hashing of incoming data, so those can actually end up using CPU while downloading.
I see, so that's why a value of 16 would be fitting for a 4 thread cpu? I've not really had I/O issues, that i know of but i changed the value and we'll see if i notice anything.
Just an FYI
Get CPU Information via Command Prompt (Windows)
Open command prompt. Type the following command: wmic cpu get name, numberofcores, numberoflogicalprocessors
@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation. ... qbittorrent/qBittorrent#11280 (comment)
Besides my previous reply, you neglected you are comparing different order of magnitudes here.
In libtorrent, the value is specified in number of 16 KiB blocks while in qbt it is in MiB. So 1024 MiB
from qbittorrent GUI is actually translated to 65536
for libtorrent.
you could raise "checking_mem_usage" to "2048"
3213 bumped it to be higher than the default in the high_performance_seed preset
@arvidn would that be a good suggestion for him?
I'm not sure. That knob controls how many outstanding 16kB read operations are allowed to be in-flight at any given time during a full hash-check of a torrent. it depends on the throughtput and latency of the disk I would think. This is non-trivial to determine, as having a torrent full of tiny files will increase latency of reading them (most likely) and would warrant a larger value.
I do not know how to apply the patch, i have not done that before.
If you're already building from source, all you need to do is:
git clone https://github.com/arvidn/libtorrent.git --branch optimize-unimited-unchoke-slots
I see, so that's why a value of 16 would be fitting for a 4 thread cpu? I've not really had I/O issues, that i know of but i changed the value and we'll see if i notice anything.
This value is (mostly) not related to the number of cores you have, it's related to how many outstanding disk I/O operations (reads and writes) you can throw at the disk and improve throughput.
This makes a difference even for SSDs with very low latency, because their throughput is so high that the bandwidth delay-product still ends up pretty high. If you're not filling that pipe, you're not getting the max throughput of the drive.
If you're already building from source, all you need to do is:
git clone https://github.com/arvidn/libtorrent.git --branch optimize-unimited-unchoke-slots
I'm not building from source, someone else is.
https://forum.deluge-torrent.org/viewtopic.php?f=12&t=55463&sid=200c1f9bfde4c809aa71ef24c6c5e168
@arvidn would that be a good suggestion for him? - this is now a feature available in qbittorrent 4.2.0 beta but the "MAX" limit can only be raised to "1024" in the GUI as per "DEFAULT" in Libtorrent documentation. ... qbittorrent/qBittorrent#11280 (comment)
Besides my previous reply, you neglected you are comparing different order of magnitudes here. In libtorrent, the value is specified in number of 16 KiB blocks while in qbt it is in MiB. So
1024 MiB
from qbittorrent GUI is actually translated to65536
for libtorrent.
Is that conversion right??
Is that conversion right??
if the unit is MiB, 1024 is 1 GiB, which is 1024*1024 / 16 blocks, so yes, 65536
Is that conversion right??
if the unit is MiB, 1024 is 1 GiB, which is 1024*1024 / 16 blocks, so yes, 65536
Ah yes, apologies - a miscalculation on my part.....fully understand now.
I suppose users with 1,000's of torrents would see a performance increase in checking torrents if the limit was raised though wouldn't they......so there is basis for raising this limit no?! Especially when certain user machines have 16/32+ GB Ram available.
Shouldn't the default 3000 unchoke limit in the "high performance seed" profile be changed to -1 as default? I'm guessing "high performance seed" assumes you have a fast upload, 400/500mbit or above.
Shouldn't the default 3000 unchoke limit in the "high performance seed" profile be changed to -1 as default?
yes, it probably should
Are there any settings we've forgotten that helps with disk writes? Is there any merit in changing the value for whole_pieces_threshold to increase disk performance for high speed downloading?
not that I can think of off the top of my head. I think the best way is to enable stats alerts and log them to profile what the bottlenecks are.
not that I can think of off the top of my head. I think the best way is to enable stats alerts and log them to profile what the bottlenecks are.
The only bottleneck there is on occasion, is write I/O. But i know that's already been discussed to great lengths, so i'll just leave it at that.
Would "read_cache_line_size" & "write_cache_line_size" being increased -> 512/1024/2048 etc have an impact on a seeding only system & also with disk i/o queue job size when checking torrents/fastresume?
those settings don't affect checking. Checking always reads sequentially and the read-ahead is controller by checking_mem_usage (or something like that).
The risk of increasing the cache line size too much, is that you start thrashing the cache. they control how many blocks to read in at a time, from disk into the cache. If the other peer isn't requesting blocks sequentially, reading too much is wasteful.
With 16gb ram, would it make sense to set "cache_size": 65536" to double or more? 65536 blocks = 1gb disappears very quickly at gbit speeds.
What are the values for "disk_io_read_mode / write", the reference page only has "enable_os_cache" or "disable_os_cache", but not their numeric values. Is -1 disabled or is it 0 for disabled?
"use_write_cache" is depreciated, i am guessing there is no point in having it on.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I look up basically every setting in an attempt to understand what they do before changing something i might've read from a guide somewhere. Most i understand and there's those i don't understand very well, while it can also be difficult keeping track of how well settings play with each other, like disk settings etc. And it's not easy to find info on google as discussions mostly center around specific settings and not the entire config.
I understand the "high performance seed" profile is (primarily) for seeding only, there is no "high performance both ways" profile, so i figured i'd use the performance profile as a base to create a config that also performs well during download.
Libtorrent 1.2.2. Deluge 2.0.3. Windows 10.
Right now i'm running the "high performance seed" profile with these settings. Notice anything wrong with this?
Edit: Updated to reflect present settings.
Also, i'm uncertain about "piece_extent_affinity", i understand it's supposed to help with disk i/o, but at the same time can hurt performance. Does anyone have any experience with this setting?