Reduce memory consumption by unloading page/document objects

espresso3389 / pdfrx

pdfrx is yet another PDF viewer implementation that built on the top of pdfium. The plugin currently supports Android, iOS, Windows, macOS, Linux, and Web.

MIT License

60 stars 36 forks source link

Reduce memory consumption by unloading page/document objects #125

Closed yhyh0 closed 1 month ago

yhyh0 commented 1 month ago

This PR is not ready to be merged. The work is not complete, and might not be the final solution. It does provide some promising result though.

In a situation where a 3k x 4k page is being zoom to a scale of 6(with the partial image commit). This is before:

And this is after:

Thoughts?

I should probably also try closing the doc early.

espresso3389 commented 1 month ago

I think "close" should be optional regardless of whether it is the default or not.

We should have some kind of flag named "keepMemoryCompact" or such to optimized for low memory consumption but user can disable it for performance even though I don't know how much we can get the performance gain if keep the cache...

espresso3389 commented 1 month ago

So, in the code we should have some functions like

ensurePageAvailable or such to open the page handle before doing some task
releasePageIfNeeded to close the page if the low memory option is enabled

yhyh0 commented 1 month ago

I think "close" should be optional regardless of whether it is the default or not.

We should have some kind of flag named "keepMemoryCompact" or such to optimized for low memory consumption but user can disable it for performance even though I don't know how much we can get the performance gain if keep the cache...

The performance might not worth to have one more exposed param.. Even if there is performance issue on not having cache, we probably should think of something else.

Now the memory usage easily goes up to 2g when I display a 1k x 2k page with only a few thumbnails, the app gets killed too often.

yhyh0 commented 1 month ago

@espresso3389 Now I'm trying to figure out a way to make the doc "on-demand" as well.. One doc takes about 100mb in memory in my case, what if there are a few PDFViewers for different files..

yhyh0 commented 1 month ago

Tried a couple things, basically the real issue is the doc.. loadPage only takes 5-15ms, rendering takes 5-500ms usually depends on how large the page is, loadDoc takes 1-3 seconds..

I think it is pretty safe to say that making the pages "on-demand" should be alright, considering that the quite a few hundred MB's memory and the crashes it saves.

The doc uses about 100mb memory on each viewer/file. It would be better that it also goes away, but that is indeed a tradeoff. I ended up adding a flag keepsDocInMemory to DocumentRef, and also changed its == behavior, quite a lot of changes...

espresso3389 commented 1 month ago

@yhyh0 Simple question. What kind of PDF document are you using; how many pages, how much size does it have. For general PDF files, loadDoc does not take so long.

So, my feeling is that your strategy seems better then the current structure. And I'm also interested in purging PdfDocument from the memory. My suggestion is that we should measure the first document loading time. And if the loading time is less than some threshold, we uses close-everytime strategy; if not, we use the current keep-document-open strategy.

espresso3389 commented 1 month ago

And, anyway, could you please revert the documentRef changes? I want to merge the page unloading code at first. And for the document, we will do it after the merge.

espresso3389 commented 1 month ago

Anyway, PdfDocumentRef structure is not simple. It has password dialog interaction part and if we re-open the document frequently without user interaction, we should introduce some password cache mechanism at first (at least). Or we should re-architecture the things to handle the things better...

yhyh0 commented 1 month ago

And, anyway, could you please revert the documentRef changes? I want to merge the page unloading code at first. And for the document, we will do it after the merge.

I tried a few PDF files, from 2-200MB in size. It took 1 second also for the smaller one, maybe since I load/unload the doc on every function and that the lock also makes some of them take longer, but still in that seconds range for sure.

I think whether to keep the doc might be more about how we use the viewer. There are book covers, where I only need to render the doc once, hence not keeping the doc. But for full screen view that people interact a lot, even 200ms might be too much a delay..

The PR should be good for unloading the pages. I just haven't cleaned all the places we stored the page handle, but if we are going to make more changes and refactors, that is kinda acceptable for now?

yhyh0 commented 1 month ago

Anyway, PdfDocumentRef structure is not simple. It has password dialog interaction part and if we re-open the document frequently without user interaction, we should introduce some password cache mechanism at first (at least). Or we should re-architecture the things to handle the things better...

It's complex indeed... I simply saved a builder closure wrapping await openPdfDocument(password) in it to make my POC work, but far from ready for a PR, quite a lot of things to consider

espresso3389 commented 1 month ago

I've fixed several isolate boundary errors on your code.

espresso3389 commented 1 month ago

1.0.55 that contains the PR is released!