ahrm / sioyek

Sioyek is a PDF viewer with a focus on textbooks and research papers
https://sioyek.info/
GNU General Public License v3.0
7.21k stars 236 forks source link

Keeping PDF open for days results in high Memory consumption #693

Open jash-maester opened 1 year ago

jash-maester commented 1 year ago

ISSUE: Leaving PDF Files open on work computer for more than 24hrs results in Sioyek consuming more than 3-4GB of RAM, after which I have to close and re-open the same PDF file to reset the memory counter. There could be caching of some sorts or some memory being allocated and not freed at runtime. What I can conclude is that a very small amount of memory is being allocated every few minutes or so which slowly accumulate to a bigger number. It pushes browser or other stuff to swap memory pretty easily.

I don't know how to debug the issue, I would be rather interested in debugging the issue if anyone points me in the right direction. Also, I'm using the AppImage version, so the same issue could be replicated by anyone hopefully.

ahrm commented 1 year ago

Are you using a single PDF file or opening multiple PDF files over the course of 24 hours?

HakonHarnes commented 1 year ago

You can debug this using tools such as Valgrind or cppcheck. Valgrind is dynamic and will check for leaks at runtime, while cppcheck will just statically analyzes the code.

Valgrind

valgrind --leak-check=full \
      --show-leak-kinds=all \
      --track-origins=yes \
      --verbose \
      --log-file=valgrind.log \
      <path-to-sioyek-executable> <file.pdf>

You have to build Sioyek using the -g flag to obtain useful debug information (I modified the build_linux.sh script). Here is the leak summary for Sioyek:

==316370== LEAK SUMMARY:
==316370==    definitely lost: 410,192 bytes in 25 blocks
==316370==    indirectly lost: 1,368,575 bytes in 2,738 blocks
==316370==      possibly lost: 1,350,764 bytes in 366 blocks
==316370==    still reachable: 6,015,161 bytes in 3,142 blocks
==316370==                       of which reachable via heuristic:
==316370==                         length64           : 272 bytes in 4 blocks
==316370==         suppressed: 0 bytes in 0 blocks

See output here: https://drive.google.com/file/d/10eX5olI5-VUjJAWd4fGZp_uUucCq-pBj/view?usp=sharing

Cppcheck

cppcheck sioyek/pdf_viewer

See output here: https://drive.google.com/file/d/10ru1MeyEjK3yGqAgzozM1tXIaG9pDWg8/view?usp=sharing

HakonHarnes commented 1 year ago

The biggest offenders seem to be in the pdf_process_Do function in pdf_interpret.c:

==360950== 4,152,640 bytes in 3 blocks are still reachable in loss record 1,387 of 1,387
==360950==    at 0x4841888: malloc (vg_replace_malloc.c:393)
==360950==    by 0x37725B: do_scavenging_malloc (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x377385: fz_malloc (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x37DE10: fz_new_pixmap_with_data (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x37DF4A: fz_new_pixmap (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x361432: fz_decomp_image_from_stream (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x361EF0: compressed_image_get_pixmap (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x360DF4: fz_get_pixmap_from_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x342F01: fz_draw_fill_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x413774: fz_fill_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x47394A: pdf_show_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x4627EA: pdf_process_Do (in /home/hakon/sioyek-fork/build/sioyek)
==360950== 2,112,984 bytes in 2 blocks are still reachable in loss record 1,386 of 1,387
==360950==    at 0x4841888: malloc (vg_replace_malloc.c:393)
==360950==    by 0x37725B: do_scavenging_malloc (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x377385: fz_malloc (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x37DE10: fz_new_pixmap_with_data (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x37DF4A: fz_new_pixmap (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x361432: fz_decomp_image_from_stream (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x361EF0: compressed_image_get_pixmap (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x360DF4: fz_get_pixmap_from_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x342F01: fz_draw_fill_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x413774: fz_fill_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x473795: pdf_show_image (in /home/hakon/sioyek-fork/build/sioyek)
==360950==    by 0x4627EA: pdf_process_Do (in /home/hakon/sioyek-fork/build/sioyek)

Seems to be something related to images. Opening a PDF with no images has substantially less memory leakage. @jash-maester is your issue limited to PDFs with images? Do PDFs with only text cause issues for you?

jash-maester commented 1 year ago

@ahrm Opening single PDF file or opening multiple PDF files have the same effect i.e. RAM consumption increasing.

jash-maester commented 1 year ago

@HakonHarnes Let me confirm you back tomorrow or like after few hours I will open a pdf with only text and will confirm whether this behaviour happens or not.

ahrm commented 1 year ago
jash-maester commented 1 year ago

@ahrm yes, I actively use sioyek most of the time, when I left it overnight, it was also running in the foreground. Although I use DWM with slock lockscreen and my screen was locked. I usually have a browser and sioyek open. Usually I never open stuff and go, but accidentally discovered this issue.

prefs_user.config

search_url_g https://www.google.com/search?q=

startup_commands toggle_custom_color

#### LOAD DRACULA ##########
#source /home/jash/.config/sioyek/dracula.config

I have nothing else set, even commented out the dracula theme. Nothing else I modified from vanilla Sioyek.

ahrm commented 1 year ago

@HakonHarnes Let me confirm you back tomorrow or like after few hours I will open a pdf with only text and will confirm whether this behaviour happens or not.

Any updates on this?

jash-maester commented 1 year ago

@ahrm @HakonHarnes Sorry for the late reply, I was ill for the last few days. I can confirm opening up a pdf with only text document does not cause memory leaks or RAM usage blowing up. To test it, I created a pdf file with only text and opened it with Sioyek. Left it open for 2 days not a single MB of RAM was consumed more.

HakonHarnes commented 1 year ago

I can confirm opening up a pdf with only text document does not cause memory leaks or RAM usage blowing up.

That confirms my suspicion that the memory leak is related to how images are handled. It also agrees with Valgrind which reports the biggest offenders are in the pdf_process_Do function (pdf_show_image) in pdf_interpret.c.

@ahrm Are images loaded and freed on demand? That is, the image is lazily loaded when viewing the page for the first time and freed later when viewing a different page. Could it be that the mechanism of freeing images is somehow not working as intended?

ahrm commented 1 year ago

@ahrm Are images loaded and freed on demand? That is, the image is lazily loaded when viewing the page for the first time and freed later when viewing a different page. Could it be that the mechanism of freeing images is somehow not working as intended?

No, we don't do anything special about images. We just use the MuPDF's pdf rendering function. I have not been able to reproduce this issue even on PDFs with images. So I assume either it was some other bug in sioyek which caused the leak or maybe there was no leak at all? Note that when using sioyek to open many documents, we don't release the previous documents (this is by design so for example going back to the previous document is fast and simple) maybe the original memory usage was caused because sioyek was used to open other documents before the last one?

@jash-maester can you still reproduce the bug on the original PDF which caused the issue?

HakonHarnes commented 1 year ago

I have not been able to reproduce this issue even on PDFs with images.

@ahrm This is on Windows, right? I can try to see if I replicate the issue on Arch Linux. Maybe it's Linux-specific, or specific to the AppImage build for Linux.

ahrm commented 1 year ago

@ahrm This is on Windows, right?

Yes.

HakonHarnes commented 8 months ago

The biggest offenders seem to be in the pdf_process_Do function in pdf_interpret.c:

This file is in MuPDF, so I wonder if the memory leak is due to a bug in MuPDF. I wonder if updating to the latest version of MuPDF will resolve the issue.

@ahrm Perhaps this is related: https://bugs.ghostscript.com/show_bug.cgi?id=705621.

ahrm commented 8 months ago

The biggest offenders seem to be in the pdf_process_Do function in pdf_interpret.c:

This file is in MuPDF, so I wonder if the memory leak is due to a bug in MuPDF. I wonder if updating to the latest version of MuPDF will resolve the issue.

@ahrm Perhaps this is related: https://bugs.ghostscript.com/show_bug.cgi?id=705621.

Wow, I opened that issue myself but completely have forgotten about it! Anyway, which version of sioyek are you using (main branch or development branch?). We don't use FZ_STEXT_PRESEVE_IMAGES in the main branch anymore, but we do use it again in the development branch, I also have upgraded mupdf in the development branch (not to the newest version but a newer version).

HakonHarnes commented 8 months ago

My initial tests above were in April 2023 on the main branch. I use the development branch now. I haven't checked it for memory leaks, but I do recall Sioyek used ~6-8 GB of RAM the other day, which seemed a bit excessive.