ARK-Builders / ARK-Navigator

Android app for navigation through your data
MIT License
15 stars 15 forks source link

#153: Faster PDF preview generation #271

Closed kirillt closed 1 year ago

kirillt commented 2 years ago

153

Adopting new PDF functionality from arklib See https://github.com/ARK-Builders/arklib/issues/11 and https://github.com/ARK-Builders/arklib-android/issues/7 for details

kirillt commented 2 years ago

Awesome that we touched base with and wired some advanced stuff from Rust side to Kotlin. But here is performance degradation, at the moment.

Manual testing on a folder with many big PDF files produced the following results:

The folder contains 469 documents, size of each is from 41KB to 21MB.

kirillt commented 2 years ago

PDF archive for testing (487MB): https://www.taran.space/dev/pdf-test.zip

j4w3ny commented 2 years ago

Awesome that we touched base with and wired some advanced stuff from Rust side to Kotlin. But here is performance degradation, at the moment.

Manual testing on a folder with many big PDF files produced the following results:

* **this branch: 115 seconds**

* **`main` branch: 40 seconds**

The folder contains 469 documents, size of each is from 41KB to 21MB.

We could switch to Medium Quality for rendering preview. I have tested it and it indicated that it will significantly cut down the time cost while providing previews with similar quality.

kirillt commented 1 year ago

State of the things: after recent changes in arklib Navigator from this PR hangs up during rendering previews for PDFs. Sometimes, after 1 document, sometimes after 10.

If we look closer at this thread: https://github.com/ajrcarey/pdfium-render/issues/59, we'll notice that changes in arklib are almost identical to the code sample kindly provided by @ajrcarey :) And he put it clearly that this snippet isn't tested and that Pdfium itself is not thread-safe :) Although it could work.

I suggest to do the simplest thing for this moment and ensure that all PDF previews are generated sequentially. We could batch PDF resources into separate set and process it from a dedicated thread/coroutine. @mdrlzy probably you could implement it?

kirillt commented 1 year ago

Also this feature could help us to debug this https://github.com/ARK-Builders/ark-components/issues/61

kirillt commented 1 year ago

Sequential previews generation benchmark:

OS: Android 11 Device: Samsung Galaxy Note 10+ Benchmark data: 477 PDF files, 857 MB total

Results (total indexing time, previews/thumbnails generation is the bottleneck): main: 1 min 5 sec this PR (high): 9 min 43 sec this PR (medium): 1 min 4 sec

kirillt commented 1 year ago

Data size of outputs in sequential tests: main: 84.7 MB this PR (medium): 87.4 MB

Data sizes of individual previews slightly differ, some are heavier in this PR and some are lighter comparing to main.

Dimensions of the previews are the same in both versions. I don't see significant difference in quality of previews.

Samples of the previews (click to see in the original size):

(main)

(this PR)

Previews of landscape-oriented documents are generated by this PR version in portrait layout, they are just turned clockwise. Not sure if it is a problem, anyway should be possible to fix easily:

(main)

(this PR)

kirillt commented 1 year ago

I think we should merge this PR, but also implement:

We need to take this task in-scope:

mdrlzy commented 1 year ago

Previews of landscape-oriented documents are generated by this PR version in portrait layout, they are just turned clockwise. Not sure if it is a problem, anyway should be possible to fix easily:

Maybe wrong orientation is because of this? .rotate_if_landscape(PdfBitmapRotation::Degrees90, true); https://github.com/ARK-Builders/arklib/blob/868f4586ddc62c8f438a2cd8ddd0d527f3af1452/src/pdf.rs#L45

kirillt commented 1 year ago

@mdrlzy I just realized that we can implement sequential rendering of PDF previews simply using a shared atomic variable PDF_JOBS. Each PDF coroutine checks PDF_JOBS and if it is equal than pre-defined limit (1 or 2) then the coroutine sleeps. Should be easier than doing some thread-pool magic. But the variable must be atomic (i.e. support atomic increment/decrement/equality-check).

kirillt commented 1 year ago

Superseded https://github.com/ARK-Builders/ARK-Navigator/pull/329