Closed Alexey104 closed 2 years ago
are you using a 64 bit build of libtorrent?
The 64 bit build uses memory mapped files and multi-threaded disk I/O, which is expected to provide better performance compared to the 32 bit counterpart. It uses fopen()
/fclose()
, is single-threaded and opens and closes the file for every access (which normally isn't very fast on windows).
I see, you're building the binding in 64 bit mode, but not the main library. However, on linux I believe 64 bit builds is the default anyway, so that probably doesn't make a difference.
Do you know whether you're CPU-, network- or Disk I/O bound?
Thank you for your feedback!
are you using a 64 bit build of libtorrent?
Yes, it's 64-bit. I've recompiled with address-model=64
just to be sure.
Do you know whether you're CPU-, network- or Disk I/O bound?
Disk is definitely the bottleneck for me. I use an old slow 5200rpm 2TB WD Red HDD for torrenting. But here below is the comparison of downloading the same torrent using RC_1_2(upper) and RC_2_0(lower) :
Disk is definitely the bottleneck for me.
Have you tried with RAMdrive to confirm that? @arvidn : what's the change in RC2 vs RC1.x ?
the main (disk related) change is the use of memory mapped files for disk I/O. But since not all systems support memory mapped files, there's a simple, single-threaded fopen()
-based fallback for such cases. The mmap-backend also maps whole files, which means it needs a lot of virtual memory, so is only enable in 64 bit builds.
I say this just to make sure that it is in fact the mmap-based back-end you end up using. If not, that would be the most obvious explanation for the slow-down.
Other than that, there are session-stats that can be printed to the log (and analysed with a script in libtorrent). Short of doing that though, you could experiment with various disk I/O settings, like the number of I/O threads.
I would expect the kernel to do a decent job of caching disk writes in memory and then write it to disk in a reasonable order to maximize throughput. libtorrent-1.2.x is not sophisticated in which order it writes blocks to disk. Obviously, Linux won't cache dirty pages indefinitely and once you reach a steady state might be limited by the HDD write speed.
Do you have a lot of RAM?
One change I can imagine might make a difference is that libtorrent-1.2.x had an application level disk cache which it would allocate and store blocks in while waiting for the disk. It's possibly that the torrents you download all fit in your working set, and can essentially download to RAM, and then flush to disk in the background. In libtorrent-2.0 it's up to the kernel to balance how many dirty pages it keeps for the bittorrent process before it starts throttling it. So it's possible the kernel is more conservative than the "manual" disk cache in 1.2.x.
Do you experience a high download rate at first, to have it drop once some cache threshold is reached?
Do you have a lot of RAM?
8 Gigs.
Do you experience a high download rate at first, to have it drop once some cache threshold is reached?
No, download speed is constantly low from the beginning to the end(compared to RC_1_2).
One change I can imagine might make a difference is that libtorrent-1.2.x had an application level disk cache which it would allocate and store blocks in while waiting for the disk. It's possibly that the torrents you download all fit in your working set, and can essentially download to RAM, and then flush to disk in the background. In libtorrent-2.0 it's up to the kernel to balance how many dirty pages it keeps for the bittorrent process before it starts throttling it. So it's possible the kernel is more conservative than the "manual" disk cache in 1.2.x.
Yes, I can see the difference in that with RC_1_2 RAM is constantly filled with cache_size
* 16 KiB blocks of data during the entire session, whereas with RC_2_0 cache_size
doesn't matter anymore, and RAM usage changes dynamically depending on the current workload.
Other than that, there are session-stats that can be printed to the log (and analysed with a script in libtorrent). Short of doing that though, you could experiment with various disk I/O settings, like the number of I/O threads.
Thank you, I will dig into it when I have more time and report back if achieve any progress on this.
@Alexey104 To prove that you are right, you can take two programs: Libretorrent (2.0.4 libtorrent) and Media Library (1.2.13 libtorrent). Install it on one Android device and enable video recording. One by one download the same torrent. Upload the video to the cloud. For example mega.nz. Then post the link here.
I'm also experiencing 3x slower download speeds with libtorrent 2.0.5 compared to 1.2.x (with qbt frontend on Arch Linux). I have a 2.5 Gbps connection and when using 1.2.x I'm able to saturate my connection at a sustained ~300 MiB/s (2500 Mbps) with ~100 seeders (single or multiple torrents). When I switch to libtorrent 2.x, I can only reach 100 MiB/s (~700 Mbps) no matter how many torrents or seeders. That means I am losing 1.5-2 Gbps worth of download speed. That's quite a hit. Switching back to 1.2.x immediately solves the problem.
I have tried playing with the IO threads but no dice. Are there other options I can try to change to replicate the old behaviour and download at a sustained 2.5 Gbps again?
FYI, I'm using an NVMe drive and able to read/write at 2600 MiB/s so drive speed isn't an issue. Also, my system has 8 GB of RAM and a dual-core i5-7360U. But these specs were fine to saturate my speed before with libtorrent 1.2.x.
I'm also experiencing 3x slower download speeds with libtorrent 2.0.5 compared to 1.2.x (with qbt frontend on Arch Linux).
"3x slower" means 1/3 the rate, right?
I have tried playing with the IO threads but no dice. Are there other options I can try to change to replicate the old behaviour and download at a sustained 2.5 Gbps again?
You could try increasing max_queued_disk_bytes, which defaults to 2 MB.
Thanks @arvidn. Yes, 1/3rd the rate. I tried to increase max_queued_disk_bytes in the source code, but it didn't change anything on my setup.
However, I did find the solution. qBittorrent defaults to a file_pool_size of 5000. When I put this down to 40 (the default in the libtorrent reference), I am able to download at a sustained 2.5 Gbps with 2.x. I set it to 10 because I found that more stable (less sudden drops in throughput). Is there a consequence of setting it that low?
An observation with memory usage (possibly related to #6667 and new mmap structure):
@sledgehammer999 ^^ it sounds like a smaller default file_pool_size
makes sense
I set it to 10 because I found that more stable (less sudden drops in throughput). Is there a consequence of setting it that low?
The file pool is a cache of open files (and memory maps of those files). The size determines how many files are kept open at most. So, a small cache size means there will be more calls to open()
, mmap()
, munmap()
and close()
. Those calls aren't terribly expensive on linux, so more calls probably won't make a very big difference.
I suspect the main reason this affects your download speed is that the virtual address space, and potential working set size increase is loosely proportional to the number of files you keep open. As you noted in (2), the virtual memory size is large when mapping a lot of large files. The working set being large just means the kernel hasn't found a better use for those pages yet. They should be among the first to be reclaimed when your system needs more RAM.
Using a smaller file pool will cause files to be munmap()
ed more often, which is a pretty strong hint to the kernel that it should flush those dirty pages to disk.
It's not clear to me how the larger working set causes the slowdown in download rate though. Perhaps the process is being throttled in allocating new pages, or perhaps just the kernel maintenance of the pages is costly. There's more work to be done tuning this. One of my ideas is to test the new MADV_COLD
advice, to really make pages first-in-line to be reclaimed.
Is this only linux related, or might be related for Windows too?
@arvidn How do you plan to use MADV_COLD
? I'm not an expert on Linux memory management, but I naively added | MADV_COLD
to advise
in mmap.c then recompiled libtorrent and qbt, but it didn't seem to make any difference. Using a file_pool_size
of 5000 still results in download rate capped at 1/3rd rate (and 60% MEM/3000G VIRT). Setting file_pool_size
to 10-40 fixes the download rate again.
I don't use qbt, and file_pool_size
is default(40). Still get slow downloads.
@Alexey104 Have you tried to set it to 10? Keep in mind I am not disk I/O bound since I'm using an NVMe drive. Even though the problem might be caused by the same change (mmap backend), the solution for your case might be different if you are in fact disk I/O bound.
Have you tried to set it to 10?
No, I haven't yet. I am sick with corona and just reading this thread laying at bed with phone. But I'll try when I feel good enough to move my ass to the desktop machine.
How do you plan to use MADV_COLD?
My idea (which I haven't had time to test yet) is to keep a ring-buffer of the most recently completed piece indices. i.e. the moment we have a piece, and it passes the hash check its piece number inserted into the ring-buffer.
At that point, we have already hashed the piece, so the only reason it might get read again is if a peer asked for it. This is likely to happen since we download the rarest pieces first, so once we complete a piece, the most number of other peers will be interested in it. For this reason, it should be left considered "hot" (or at least not cold) for a while. If we receive a request from a peer for this piece, it's removed from the ring buffer (so, I suppose it needs an index too). If the piece index makes it all the way through the ring buffer, it means it won't have been requested for a while, and we can then mark it as cold (with MADV_COLD
), making it a candidate for eviction.
I naively added | MADV_COLD to advise in mmap.c then recompiled libtorrent and qbt, but it didn't seem to make any difference.
I wouldn't be surprised if this advice is removed the moment you access the memory, since it's no longer cold then, by definition. If you set this on all pages when you first map the file, every single page will become hot before you want them to be evicted. I don't think that approach would make a difference. You could try to advise MADV_COLD
on every block that has been written. Basically, to tell the kernel to assume nobody will be interested in the data again. But, presumably, if a peer gets to requesting it before the kernel evicts it, it won't be considered cold anymore. Perhaps this would be sufficient, over my more complex algorithm above.
Lowering file_pool_size
and increasing aio_threads
didn't make any difference for me at all.
Any updates on this?
please give this patch a try: https://github.com/arvidn/libtorrent/pull/6703
@arvidn, thank you for your hard work, but in my case, unfortunately, this patch didn't make any difference. Still get at most 1/2 of my usual download rate.
@Alexey104 that patch only makes a difference (by default) on windows. I haven't received many reports of this being a problem on linux. Once applying that patch, you also need to enable the option disk_io_write_mode
= settings_pack::write_through
.
That will use the MADV_PAGEOUT
hint, which is new in Llinux 5.4. If you have an older kernel, that won't make a difference either.
Hi @arvidn,
EDIT: See post below. These results are not correct because I compiled RC_2_0 with the debug flag and 0 optimizations. Everything worked fine after I compiled with Release flag (which adds -O3).
So I'm seeing vastly different throughput performance in v2.0.5 vs. RC_2_0 branch. I tried your patch too and set disk_io_write_mode = settings_pack::write_through
in the qbittorrent source, but it still did not help. This is on Arch Linux (kernel 5.16.2), not Windows.
Here is a comparison of the three cases with throughput graphs (measured over 5 minutes). All cases had 16 I/O threads, 4 hashing threads, and 10 file pool size.
~~1. Tag v2.0.5: Able to maintain 2.5 Gbps throughput. CPU usage for qbt process is 100-200% while downloading, but QBT web UI is still responsive while downloading at max throughput.
disk_io_write_mode = settings_pack::write_through
. Doesn't seem to make any difference compared to RC_2_0.~~As you can see, there seems to be some problem introduced between v2.0.5 and current RC_2_0 that severely degrades the throughput, and the patch does not seem to help. The QBT web UI being unresponsive in RC_2_0 might be related to this throughput issue too.
Do you have any idea what could have caused this change since v2.0.5?
Sorry @arvidn, please disregard my previous post. I made a mistake and compiled RC_2_0 with debug flags and no optimizations, while 2.0.5 was compiled with optimizations and no debug flags. The lack of optimizations and debug info made my RC_2_0 throughput suffer.
After I recompiled RC_2_0 with the CMAKE flag -DCMAKE_BUILD_TYPE=Release
which applies O3 optimizations, I no longer have any throughput issues with RC_2_0 and can reach and maintain full throughput without any drops as can be seen from this throughput graph.
Sorry for the confusion.
heh, I was just about to suggest reviewing how the different versions were built :)
@sledgehammer999 ^^ it sounds like a smaller default
file_pool_size
makes sense
I was hoping that would be the same for me, sadly with it set to 10, I keep getting the same slow DL speed as with the default value. I also only have on average 1 or 2 torrents actively downloading.
I think the file pool size indirectly affects the likelihood of having many dirty pages. Presumably, a more reliable approach is to explicitly page-out dirty pages explicitly. like this patch: https://github.com/arvidn/libtorrent/pull/6703
please give this patch a try: #6703
@Alexey104 that patch only makes a difference (by default) on windows. I haven't received many reports of this being a problem on linux. Once applying that patch, you also need to enable the option
disk_io_write_mode
=settings_pack::write_through
.That will use the
MADV_PAGEOUT
hint, which is new in Llinux 5.4. If you have an older kernel, that won't make a difference either.
Does enabling disk_io_write_mode
= settings_pack::write_through
have any effect on Windows/macOS
or is it specific to just Linux
?
Windows 10 x64 here.
On my side, top speed is the same (around 2gb/sec, the max of my internet line), but rc1.2 can maintain that all the way, while 2.0 can't. Up and down, up and down, like it's slowing down when it feel the need to write to the disk, while rc 1.2 can do both no problem...
(EDIT : on rc 1.2 I'm using a cache of only 32 mb. No such setting on rc 2.0. I don't hit any memory limit, and I ve the same behaviour with or without os cache enabled.)
Does enabling disk_io_write_mode = settings_pack::write_through have any effect on Windows/macOS or is it specific to just Linux?
It has an effect on all major OSes. On windows it uses FlushViewOfFile()
and on posix systems it uses msync()
@Rootax it sounds like maybe the disk write buffer needs to be a bit larger in libtorrent 2.0.x. The 32 MiB in libtorrent 1.2 may still exceed that while flushing to disk, so it's possible end up using more than that in practice.
Right now, libtorrent defaults max_queued_disk_bytes
to just 1 MiB. Maybe that really should be closer to 32 MiB. Would you mind trying that?
@Rootax it sounds like maybe the disk write buffer needs to be a bit larger in libtorrent 2.0.x. The 32 MiB in libtorrent 1.2 may still exceed that while flushing to disk, so it's possible end up using more than that in practice.
Right now, libtorrent defaults
max_queued_disk_bytes
to just 1 MiB. Maybe that really should be closer to 32 MiB. Would you mind trying that?
Already pushed at 32mb, even tried 64mb. It helps at first (like i'm stable for a little longer), but at some point the yoyo behaviour is back.
For the record the final storage is not the problem (I try on nvme ssd and a ram drive)
Does enabling disk_io_write_mode = settings_pack::write_through have any effect on Windows/macOS or is it specific to just Linux?
It has an effect on all major OSes. On windows it uses
FlushViewOfFile()
and on posix systems it usesmsync()
Seems in qBittorrent we still use settingsPack.set_int(lt::settings_pack::disk_io_write_mode, mode);
So perhaps we need to do something like below:
#ifdef QBT_USES_LIBTORRENT2
settingsPack.set_int(lt::settings_pack::disk_io_write_mode, lt::settings_pack::write_through);
#else
settingsPack.set_int(lt::settings_pack::disk_io_write_mode, mode);
#endif
Maybe not related, but I noticed another weird behaviour on my system (win 10x 64, 32gb of ram, latest qbt/libt.
When qbt decide to write the data to the disk, the disk is reading the data too. Like, read 200mo/sec, write 200mo/sec, at the same time, until everything is flushed. I didn't see this behaviour with lib 1.2. Did the data checking change in lib 2.0, so,I don't know, data is read again before writing or something like that ?
Thx.
but rc1.2 can maintain that all the way, while 2.0 can't. Up and down, up and down, like it's slowing down when it feel the need to write to the disk, while rc 1.2 can do both no problem...
I also observe this behavior. But max download speed also suffers significantly in my case. Cannot reach even half of my bandwidth with 2.0.
When the speed is dying, I see that QBT show that the write cache is overloaded (whatever this is measuring with lib 2.x since the cache is different). Doesn't happen with lib 1.2.x.
Maybe not related, but I noticed another weird behaviour on my system (win 10x 64, 32gb of ram, latest qbt/libt.
When qbt decide to write the data to the disk, the disk is reading the data too. Like, read 200mo/sec, write 200mo/sec, at the same time, until everything is flushed. I didn't see this behaviour with lib 1.2. Did the data checking change in lib 2.0, so,I don't know, data is read again before writing or something like that ?
Thx.
Maybe this is related to this :
"The read access occurs because as the pages are accessed for the first time they need to be read in from disk. The OS is not clarvoyant and doesn't know that the reads will be thrown out.
To avoid the issue, don't use mmap(). Build the blocks in buffer and write them out the old fashioned way."
Maybe I'm all wrong. But if this is the normal behaviour, it doesn't help the performances that's for sure...
I recently landed this which might impact this issue: https://github.com/arvidn/libtorrent/pull/6861
apart from defaulting hashing_threads
to 1, it also relaxes the hashing operations of pieces as they are being downloaded to be run on any aio_thread. The main reason to use 1 hashing thread for full file checking is for them to read sequentially from the drive.
During download, however, the pieces being checked are most likely still in the cache, and random access should be fine.
@Arvid, Thank you! I will try it and report back about the results.
Unfortunately, I don't see any significant improvements :( Although it seems to be a little bit better with #6861(lower), it is still far from being as good as RC_1_2(upper):
Unfortunately, I don't see any significant improvements :( Although it seems to be a little bit better with #6861(lower), it is still far from being as good as RC_1_2(upper): ...
Is download speed still tanking when pieces are written to disk ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Was also getting significantly slower download speeds downloading with qBitTorrent 4.5.2 using libtorrent2. Switched back to the same build of qBitTorrent with libtorrent1.2 and speeds go from ~ 11MB/s to 70-90MB/s.
Everyone seems to be trying to forget or not notice problems with speed, but the problem has not gone away. People who have an Internet connection above 100 Mbps suffer from this to a greater extent. Using lt2.0 greatly affects the download and upload speed, as well as many, many related problems. https://github.com/qbittorrent/qBittorrent/issues/16043
Here I didn't forget, I just gave up on 2.x a while ago. I build what I need with 1.2 and it's working great. Great speed, no weird i/o behaviour, etc....
Le mer. 21 juin 2023, 13:08, stalkerok @.***> a écrit :
Everyone seems to be trying to forget or not notice problems with speed, but the problem has not gone away. People who have an Internet connection above 100 Mbps suffer from this to a greater extent. Using lt2.0 greatly affects the download and upload speed, as well as many, many related problems. qbittorrent/qBittorrent#16043 https://github.com/qbittorrent/qBittorrent/issues/16043
— Reply to this email directly, view it on GitHub https://github.com/arvidn/libtorrent/issues/6561#issuecomment-1600639382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLELOXMNBPMHXGT4XU7533XMLI3ZANCNFSM5IJ6DWCA . You are receiving this because you were mentioned.Message ID: @.***>
I also stopped using lt2.0, but seeing that the lt1.2 version is almost not developed, I'm worried that it will soon die altogether. It has already gotten to the point where qbittorrent is dropping i2p support, since i2p is dead in lt1.2 (https://github.com/qbittorrent/qBittorrent/pull/19207). This is in addition to the fact that lt1.2 does not contain a huge number of changes and corrections besides i2p.
Is this still an issue with libtorrent 2.0.9?
I'll tell you more. The speed of libtorrent 1.2 on tcp\udp torrent protocol is less than on https protocol. Use my torrent client. It's the only one in the world which supports distribution of torrents via https protocol.
libtorrent version (or branch): 2.0.4.0 platform/architecture: ArchLinux_x64 compiler and compiler version: GCC 11.1.0
Hello! I am using the latest Deluge(2.0.4.dev85) and have just switched from LT 1.2.14 to 2.0.4. I use the following LT settings(the others are defaults):
I've been using these settings for a long time, and they have always worked good for me with RC_1_2. My ISP speed is 600 Mb/s, and with the settings above I have 40-65 MB/s download speed on good torrents(with many seeds) when using LT 1.2.14, but when using LT 2.0.4 my download speed on the same torrents with the same settings is only 10-25 MB/s despite thousands of seeds.
I've compiled Libtorrent with the following command:
Python bindings:
Yes, I use slow HDD for torrenting, however, download speed becomes 2-3 times faster if I just switch back to RC_1_2 with the same client and libtorrent settings.
Are there any settings I can try to tune to improve my download speed? Thank you!