Excessive Disk Use - Githubissues

Renegade605 commented 7 months ago

Describe the bug Scheduled scan takes 3+ minutes to run (for 408 files / ~400 GB). For duration of scan, disk I/O is 100%, reading @ ~100 MB/s. Any other process trying to use the same disk gets completely tanked. Specifically in this case, my array is running a parity check operation; this also extends the scan time to 4 minutes. At the normally scheduled interval of 5 minutes between scans, this means the disks are only able to operate at normal capacity 20% of the time.

To Reproduce Run the container.

Expected behavior Scanning the library should be able to be performed without pegging the I/O to 100%. There are numerous other applications running on my server that involve scanning libraries of video files, and none of them block up an entire disk for minutes at a time in this way.

Screenshots With parity operation paused:

With parity operation running:

ShaneIsrael commented 7 months ago

This is not a bug, this is a problem with how you are using Docker. The container is doing its job, which is scanning files. It's going to scan those files as fast as it can since you have left your docker settings the defaults, which basically lets any container use as much of your CPU as needed.

In your docker setup whether that be a docker run command, docker compose, kubernetes, or some other docker host system you need to configure it to your own liking. Docker allows you to limit the cpu count that a container can use as well as how many cpu cycles a container can use.

Please read up on how to do that in your system.

ShaneIsrael commented 7 months ago

I should also mention, the reason it takes as long as it does is because your having it scan 400GB of files. It's comparing the hashes of each and every one of those files which is not a quick task and thus why its pegging your cpu for so long. Which again, all the more reason to apply cpu limits to the container.

Renegade605 commented 7 months ago

Lmao. I'm not some amateur who doesn't know how docker works; my configuration is absolutely not the default.

The CPU shares are already limited, and it isn't the CPU that's pegged. The disk is, which is definitely not necessary to look for new files in a folder. I don't know why you'd actually read all 400 GB every scan when all you need to know is if there is anything new. Tdarr, for example, which is a pretty unpolished piece of software, can scan a 28 TB library for new entries in about 15 seconds without nerfing disk performance.

But if you want to continue to read gigabytes of data for no reason, I guess there's no need to work on a better way, so carry on.

ShaneIsrael commented 7 months ago

@Renegade605 yes you are an amateur if your trying to make assumptions that the scanning fireshare does is the same type of scanning tdarr does.

As I mentioned, fireshares scan generates a hash against every single file every single scan and compares it to what is in the database in order to know whether the file has changed, regardless of the name as well as create a unique link to that file regardless of the name. This way even if a single byte of data in that file is different a completely new entry and unique link would be generated because the hash would be different. It also means if you change the name of the files, fireshare links will still work. You can send a link to someone, then a year later change the name of all your files and all of your old links will still work. Even if you changed the location of the files, they would still work as long as the files data hasn't changed.

That is a core feature to this system.

TO DO THAT it must read every single file of your 400GB. If doing that is causing you a problem because the disk I/o is slowing another process, don't run the scan every 3 minutes.

ShaneIsrael / fireshare

Excessive Disk Use #224