arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.25k stars 996 forks source link

RC_2_0: Write cache doesn't flush to disk #6522

Open sledgehammer999 opened 3 years ago

sledgehammer999 commented 3 years ago

Please provide the following information

libtorrent version (or branch): RC_2_0 f4d4528b89bdbcef80efa8f5b99cc0e0f92226cb

platform/architecture: Windows 7 x64 sp1

compiler and compiler version: msvc2017

please describe what symptom you see, what you would expect to see instead and how to reproduce it.

To better observe this problem you need a torrent with one big file (eg 10-16 GB) and a fairly fast connection (eg 100Mbps). My system has 16GB RAM. I doesn't matter if I enable OS cache or not, the downloaded data seem to reside in RAM for far too long. While the file downloads I observe that both the Working Set of qbittorrent and the system RAM usage go constantly up. I assume this is due to the OS caching. However it doesn't seem to flush to disk in regular intervals. Minutes have passed, GB of data have been downloaded, but the flushing hasn't happened. Let's assume I have a file manager windows open (explorer.exe) and I navigate to the file. No matter how many times I open the file properties its size on disk doesn't change. There are 2 ways I have coerced it to flush to disk:

  1. Pause the torrent. Disk activity goes up and RAM usage starts going down. And depending on the size of the cache this can start freezing up the system.
  2. From qbittorrent either right click to Open Containing Folder or double click on file to launch the associated media player. These actions basically call a shell API to do the work. But somehow also make Windows finally flush to disk.

From the little documentation online about the Windows file cache it seems that every second it would commit 1/8 of the cached data to disk. But it doesn't happen with RC_2_0.

This can have serious effects on end users:

  1. Obvious: Data corruption. A power loss and poof go the cached data. Especially when it is gigabytes worth. Also, if the user shutdowns the PC while the huge cache is flushing it will cause the shutdown process to take several minutes. In this case, an inexperienced user might just pull the plug (or hold the power button) to force a shutdown. Poof goes the data.
  2. Eventual system performance degradation: When the RAM fills I assume that Windows will start flushing. This will cause system freezes. But before that Windows might start paging other programs to disk which will also cause performance issues.
  3. download performance: If Windows starts flushing a huge cache because it was coerced then the download speed will drop or even stop altogether until the disk activity drops down again.

Furthermore, I also tested against latest RC_1_2. This doesn't happen there. It also doesn't matter if I enable OS cache or not there. I know that the file i/o subsystem has changed fundamentally between RC_1_2 and RC_2_0 but I write about it in case it matters. Also I have set cache_expiry to 60 seconds and cache size to 65MiB. AFAIK this options don't exist in RC_2_0.

PS: To demonstrate the importance of the problem. I observed this while I had something downloading in the background and I was doing "office work" (browsing, pdf opening, word writing etc) which is simple in terms of disk demand and ram demand. Yet suddenly the system was freezing up randomly. I opened task manager and my 16GB RAM had almost filled up. I saw that the disk activity was up. It took at least 20 minutes for things to be usable again.

arvidn commented 2 years ago

If you have a fast enough connection you'll still see memory exhaustion no?

I don't believe so, but please share if you do. depending on what you mean by "exhaustion" of course. All memory being used seems fine, only if it slows down the system would I consider it a problem.

arvidn commented 2 years ago

@djtj85 do you experience the cache not flushing early enough also? Do you have any more symptoms to contribute? which operating system are you on?

arvidn commented 2 years ago

I'm planning a few patches to attempt to address this issue.

I just landed an improvement to the benchmarks, where it now also measures memory usage (but only on linux so far). https://github.com/arvidn/libtorrent/pull/6679

This addresses the memory priority when checking files on unix systems: https://github.com/arvidn/libtorrent/pull/6681

This ticket is specifically about windows and I have two more patches planned.

  1. Lower the memory priority even further on windows. Currently I set it to MEMORY_PRIORITY_BELOW_NORMAL but that's just one step below normal, I could set it to MEMORY_PRIORITY_LOW
  2. explicitly flush dirty pages that have not been requested for some time (using MADV_PAGEOUT and FlushViewOfFile())
arvidn commented 2 years ago

@djtj85 I take that as a "yes". Can you describe the symptoms?

jonboy345 commented 2 years ago

I'm seeing this behavior on my Server 2019 box.

Historically, I've downloaded torrents directly to my NAS via SMB over a 10Gbps network using qBittorrent. Ryzen 1700x, 32GB RAM. I'm on a 1Gbps fiber WAN link, but limit my download speed to 60000 KiB/s to limit the issues below, but sometimes it still happens.

qBit settings

I've observed behavior in the past (pre-qBit 4.4.0) where I've seen qBittorrent downloading data from WAN, but seeing little or no network traffic over the NIC dedicated to SMB traffic (while queued IO jobs number climbs higher and higher), eventually qBit will flush pieces to the NAS in bursts. Pre-4.4.0, RAM usage by Qbitorrent wouldn't balloon to use all available RAM. I've been able to manage this by adjusting the speed limits for downloading to about 60MB/s.

Here's what I see with 4.3.9 and downloading a large torrent.

With qbit 4.4.0, I've observed the above occurs, in addition to seeing SIGNIFICANTLY more received (reading from NAS disk) traffic over the SMB 10Gbit nic while downloading. Which doesn't make sense to me. Why is qBit reading more data from disk than it's writing? Further, qBit 4.4.0, uses all available RAM, and as seen before, writes very little to disk on the NAS, while being heavy on reads.

And, now that I go to create a screenshot, I can't get qbit 4.4.0 to not crash on start up...

nagyimre1980 commented 2 years ago

we are still waiting for improvement! Qbittorrent 4.4.0 is unusable due to libtorrent 2.x!

Do you already understand the problem?

arvidn commented 2 years ago

please give this patch a try: https://github.com/arvidn/libtorrent/pull/6703

arvidn commented 2 years ago

99% RAM wouldn't be so drastic but i can not open a google chrome or any other apps or games without the system being uncertain.

so, you can open a chrome tab and start other apps, but the system is "uncertain"? By "uncertain", do you mean the whole system crashes, or freezes?

ghost commented 2 years ago

@arvidn commit https://github.com/arvidn/libtorrent/commit/eda4baa0416dc01f93d0f5a12e640d9a96859caa

I see RAM usage climbing upto 99% but with stable download speed. When 99% is reached, system starts to lag and even typing becomes difficult. Then after a few seconds, I think windows flushes the writes to disk and RAM usage goes down to 30-35%. During the flushing window download speed plummets to few MiB/s. After the flush RAM usage starts climbing again with stable speeds.

Capture2

With latest commit https://github.com/arvidn/libtorrent/commit/a45ead24b206c4804e483ae2154173a833dd5c29

RAM usage climbs to 99% and stays there. System does not become unresponsive like before. However, the download speed is very unstable.

Capture22

ghost commented 2 years ago

I think my download speed was way higher than libtorrent could validate the pieces and write them to disk. Because even when my download was complete 100%, the torrent was showing as stalled in qBittorrent. And I could see there were still 1000+ pieces left to be validated. Once they were validated the torrent was complete. So I think this method doesn't serve well for high speed users like me! And for qBt devs, I think it's less intuitive to show such torrents as stalled. Maybe show validating pieces in status!

ghost commented 2 years ago

I have tried with OS cache disabled as well with the latest commit. And it seems to suffer when RAM hit 99%. But after 1-2 mins RAM usage reset back to 35% after which speed recovers and RAM usage climbs again. This repeats in cycles.

Capture3

ghost commented 2 years ago

I bumped my aio_threads from 10 to 20 and this time got different results. My RAM usage went upto ~80% and speed was stable. But then speed went down to zero for around 90 seconds. It recovered after that and RAM usage went down to 20%. Note: OS cache disabled, latest commit.

Capture4

ghost commented 2 years ago

I tested again. This time on a SSD. It was able to sustain speed even at 99% RAM usage. So I think this issue only affects high speed downloaders using spinning disks.