espresso3389 / pdfrx

pdfrx is yet another PDF viewer implementation that built on the top of pdfium. The plugin currently supports Android, iOS, Windows, macOS, Linux, and Web.
MIT License
60 stars 36 forks source link

Speed up the first render/Give a chance to cache thumbnail #126

Open yhyh0 opened 1 month ago

yhyh0 commented 1 month ago

The changes on onViewerReady gives a chance to show a loading widget and to save the first page image as a cached thumbnail(no need to load the doc if need a thumbnail later). I thought adding a new onFirstPageRendered, but it might just makes more sense to postpone onViewerReady a bit waiting for the first render to make the 'ready' more ready.

yhyh0 commented 1 month ago

Also planning to change all cached images to pngBytes, the size would usually be less than 1/10 of rgba. But not in this PR for surel.

espresso3389 commented 1 month ago

I have completely no idea but how much does PNG decompression impacts the actual rendering time do you think? If it is feasible enough, I want to implement some code on PdfImage side; like compact or such.

espresso3389 commented 1 month ago

Ah, I see what you're talking about. If we have PNG compressed version of page images, we can take 10x more cached pages even with the same memory cost. So we don't have to access the actual PDF any longer.

espresso3389 commented 1 month ago

If we do such compression, I want to do that on native code side. The structure will get simpler.

yhyh0 commented 1 month ago

I have completely no idea but how much does PNG decompression impacts the actual rendering time do you think? If it is feasible enough, I want to implement some code on PdfImage side; like compact or such.

Yep, probably should apply to both scenarios as the flutter Image widget also only takes PNG bytes, should be just a fast one time convert for us. I will try to measure when we do the switch, but worst case we could grab the RGBA before converting to PNG.

yhyh0 commented 1 month ago

If we do such compression, I want to do that on native code side. The structure will get simpler.

hmm.. What do you mean by the 'native code side'? If it is the place of that getBytes line, yes I know that is ulgy and costs extra unnecessary cpu cycle, but just for the sake of the callbacks and limit the scope of this PR.

espresso3389 commented 1 month ago

hmm.. What do you mean by the 'native code side'? If it is the place of that getBytes line, yes I know that is ulgy and costs extra unnecessary cpu cycle, but just for the sake of the callbacks and limit the scope of this PR.

I think passing png bytes instead of ui.Image is bad dicision even though it may perform well on certain situation. I want to encapsulate such things. And, PdfImage is already such a encapsulation on raw C++ memory buffered RGBA image. PdfImage.createImage does C++ memory RGBA image to ui.Image conversion.

So, we can do everything inside PdfImage and there might be chance to implement everything in C++ code or anyway inside PdfImage.

The user don't have to know the implementation detail. The underlying data can be anything; RGBA or PNG or JPEG.

yhyh0 commented 1 month ago

hmm.. What do you mean by the 'native code side'? If it is the place of that getBytes line, yes I know that is ulgy and costs extra unnecessary cpu cycle, but just for the sake of the callbacks and limit the scope of this PR.

I think passing png bytes instead of ui.Image is bad dicision even though it may perform well on certain situation. I want to encapsulate such things. And, PdfImage is already such a encapsulation on raw C++ memory buffered RGBA image. PdfImage.createImage does C++ memory RGBA image to ui.Image conversion.

So, we can do everything inside PdfImage and there might be chance to implement everything in C++ code or anyway inside PdfImage.

The user don't have to know the implementation detail. The underlying data can be anything; RGBA or PNG or JPEG.

I see. That's a good idea, could also postpone the convert in some cases. I'm just not sure how to handle PDFImage's dispose in this case, we shouldn't just clone the entire object but also need to prevent the code in callback to save just the reference.

Please feel free to make the changes.

yhyh0 commented 1 month ago

hmm.. What do you mean by the 'native code side'? If it is the place of that getBytes line, yes I know that is ulgy and costs extra unnecessary cpu cycle, but just for the sake of the callbacks and limit the scope of this PR.

I think passing png bytes instead of ui.Image is bad dicision even though it may perform well on certain situation. I want to encapsulate such things. And, PdfImage is already such a encapsulation on raw C++ memory buffered RGBA image. PdfImage.createImage does C++ memory RGBA image to ui.Image conversion.

So, we can do everything inside PdfImage and there might be chance to implement everything in C++ code or anyway inside PdfImage.

The user don't have to know the implementation detail. The underlying data can be anything; RGBA or PNG or JPEG.

I took back what I said, compressing PNG while rendering does make it quite slow.. It shouldn't be done during rendering.

And Isolate.spawn might be too quite expensive for each render, rendering itself takes about 10-800ms, but usually it takes less than 30ms. The spawn adds up 50-400ms every time, and usually around 300ms. Please take that into considering when changing the rendering part.

espresso3389 commented 1 month ago

Each method call invokes pre-launched per-document Isolate using BackgroundWorker.compute so the cost is minimum.

yhyh0 commented 1 month ago

Each method call invokes pre-launched per-document Isolate using BackgroundWorker.compute so the cost is minimum.

That Isolate doesn't get launched until await _worker is called, meaning a new Isolate for every render. I was trying to reuse it, but encountered some memory issue on Android(we probably have to obtain a lock for more PDF calls, including load and dispose, also on pdfium instead of on document). I will try some more later.

espresso3389 commented 1 month ago

That Isolate doesn't get launched until await _worker is called, meaning a new Isolate for every render.

No, it's not correct. For the first occurrence of (await _worker), it may wait for the initialization but later calls just get the already-awaited-result immediately. No extra cost. Only the cost there is asynchronously send the request to the worker thread.

espresso3389 commented 1 month ago

Anyway, the latest master commit b126686d611e6d1eaf7ccf840df7417ab25b4cf5 reduces total number of isolates used for first initialization of PdfDocument instances.

Now, only one isolate is created on PdfDocument initialization and it is passed to the PdfDocument and PdfDocument reuses the same isolate for every API calls including render method.

image