cyanfish / naps2

Scan documents to PDF and more, as simply as possible.
https://www.naps2.com
Other
2.71k stars 320 forks source link

crash during PDF output w/OCR (out of memory) on Debian #316

Open aurbus opened 6 months ago

aurbus commented 6 months ago

Describe the bug When I attempt to output a PDF with OCR, NAPS fills up all my RAM (16gb) and swap space (4 or 5 gb I think), and then crashes.

I am running Debian testing and NAPS 7.3.1, and have been able to export PDF's this way in the past without issues.

running dmesg -k lists the following outputs which might help:

[ 1091.226375] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-naps2-1ed33cac23da4357822cf6923a59fc8d.scope,task=naps2,pid=4537,uid=1000

[ 1091.226550] Out of memory: Killed process 4537 (naps2) total-vm:290692092kB, anon-rss:11396184kB, file-rss:5552kB, shmem-rss:40152kB, UID:1000 pgtables:23344kB oom_score_adj:200

To Reproduce Export a PDF with OCR.

Expected behavior PDF would export properly

Screenshots n/a

Desktop (please complete the following information): Debian testing "trixie", NAPS 7.3.1

cyanfish commented 6 months ago

Some questions:

Also, can you try running taskset –c 0-1 naps2 and see if that helps?

aurbus commented 6 months ago

The PDF is 671 pages, but I have had the same problem with PDF's that are about half as many pages.

The images of each pages are fairly large, about 2000x4000 pixels.

I have an i7-1165G7, so 4 cores, 8 threads, at 2.8Ghz base, and 28w.

I should note that I have been able to export PDFs more than twice as long, with pages 2-3x the resolution of these, with no issues. I also tried with NAPS 7.3.0, and 7.2, and had the same issue. I suspect it may be something with Debian, and not NAPS, but I am not 100% sure, and thought this was the best place to start.

cyanfish commented 6 months ago

I'm not sure, the log seems to claim that NAPS2 is using 11GB of memory, but I can't get it past 500MB-1GB (top -o %MEM) myself running a similar test.

If this is native you could try using the flatpak (or vice-versa) and see if there is any difference. You could also try playing around with the images you're saving - maybe there's something about a particular image that's causing the issue.

aurbus commented 6 months ago

I tried with version 7.3.1 on Windows, and I was able to process all the images in question with no issue, RAM usage never surpassing about 1.2GB. I think it might be something with Debian, so I will look into it and see what it could be...thank you for the help, and the fantastic program :)

aurbus commented 1 month ago

I know this issue has been dead/not active for months now, but in the meanwhile I have had the opportunity to do quite a bit of testing on a number of different machines, and I can confirm that this issue is definitely present, and greater in scope than I thought.

The issue only seems to happen on Linux machines. I have only had the chance to test on Debian based systems (Debian stable, testing, unstable (with Kernel 6.1 through 6.9), and MX Linux (with the MX system running sysvinit instead of systemd on the Debian, unsure about the kernel on the MX machine). Debian was running on a Dell XPS 13 9310 2-in-1 with an Intel i7-1165G7 with 16GB of RAM (tested with between 1 and 12GB of swap), a 512GB Kioxia SSD, with Debian running on BTRFS.

The MX machine was an older Lenovo laptop (not a thinkpad) with 12 GB of physical RAM and 8GB of swap. I don't know any more details about it.

The issue does NOT just occur when outputting with OCR, but anytime a PDF is being produced/outputted. The problem is greatly exacerbated when using OCR (more RAM is consumed). The amount of RAM consumed depends on the resolution of the input images. The number of input images/pages in the PDF does not seem to have a bearing on the amount of RAM used (though if the number of pages is small (say <50) then it will finish before RAM usage really gets into the stratosphere.

The issue is not present on Windows at all (again, tested across several different versions of Windows and hardware). I have a friend who has a Mac, and he said NAPS consistently uses about 8-10GB when outputting, but never more. I do not have the ability to test on a Mac unfortunately. The issue is also present across all releases of NAPS that I tested, going back to 7.3, up to the latest, 7.4.3.

Hopefully this sheds some new light on this, as it renders NAPS completely unusable on Linux for me.

Thank you for your great work on this little program... it is awesome!

cyanfish commented 1 month ago

Did you try running taskset –c 0-1 naps2?