ambv / bitrot

Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.
MIT License
214 stars 36 forks source link

High Memory Usage #40

Open azurefreecovid opened 4 years ago

azurefreecovid commented 4 years ago

Hi there,

Love this library, just found it and it seems to work exactly as I want, except for one issue. It just ran my box out of memory and caused it to crash.

I'm trying to check a fairly large batch of files (about 3.7TB worth or 1,206,600 files) and bitrot really chews through the RAM (causing my box to crash). All up it seems to need 4.3GB of RAM to run, which does seem like a lot.

I'd prefer not to split my checks into multiple smaller sets if at all possible, but obviously I can't have my system crashing.

Any ideas on what I can do to fix this issue?

My system is: AMD64 Debian Stretch Python3.8

ambv commented 4 years ago

Sadly I don't think this is actionable for our little project.

Honestly plowing through 3.7TB of data was never my intended use case for this. The problem is likely not the size of data but the sheer amount of files. There's been performance updates over the years that brought some data into memory to not have to reach for them in our SQLite database all the time. That makes stuff much faster but requires more memory.

I think that if we removed this optimization, you'd get lower memory usage but you would wait for the checksums to calculate for a veeery long time, making the tool useless regardless.

If a process using too much memory is causing your entire box to crash, you should check what's up with that. It's a userspace application, this shouldn't happen no matter what it does.

azurefreecovid commented 4 years ago

Thanks for the reply, really appreciate it.

Sadly I don't think this is actionable for our little project.

Totally understand, no problems at all.

Honestly plowing through 3.7TB of data was never my intended use case for this. The problem is likely not the size of data but the sheer amount of files. There's been performance updates over the years that brought some data into memory to not have to reach for them in our SQLite database all the time. That makes stuff much faster but requires more memory.

I think that if we removed this optimization, you'd get lower memory usage but you would wait for the checksums to calculate for a veeery long time, making the tool useless regardless.

If a process using too much memory is causing your entire box to crash, you should check what's up with that. It's a userspace application, this shouldn't happen no matter what it does.

Unfortunately it is a feature of Linux based systems. When the system runs out of RAM the kernel will start killing things, hopefully the right things. On headless boxes where your webgui's and other system access is through Docker based containers they can end up be killed, and hence the box crashes (at least as far as the user is concerned) and has to be power cycled to be recovered.

Thanks again for the software. I think I have a solution (as of this afternoon), which is to give the box some swap space. Which is essentially writing the SQLite database back to disk, but in a much less efficient manner. Anyway I think it will solve the problem and speed is not a concern to me (if it takes a week to run that is fine) so it should be ok.

Keep up the great work.

ambv commented 4 years ago

Unfortunately it is a feature of Linux based systems. When the system runs out of RAM the kernel will start killing things, hopefully the right things. On headless boxes where your webgui's and other system access is through Docker based containers they can end up be killed, and hence the box crashes (at least as far as the user is concerned) and has to be power cycled to be recovered.

I'm aware of OOM killer. Big companies like Facebook and Google disable it in their fleets because it is unpredictable. You can do it, too:

# echo 2 > /proc/sys/vm/overcommit_memory