ArtifexSoftware / mupdf.js

JavaScript bindings for MuPDF
https://mupdfjs.readthedocs.io
GNU Affero General Public License v3.0
398 stars 23 forks source link

Support for other file formats like epub, xps, cbz, mobi, fb2, svg #115

Open arun-mani-j opened 1 month ago

arun-mani-j commented 1 month ago

Hii

First of all, thanks for mupdf.js (and that too WASM), it is super cool!

I come from PyMuPDF background, there we could load epub, xps, cbz, mobi, fb2, svg file formats and work with them just like PDF documents. Does mupdf.js also work the same way? Can one load other formats except PDF?

I tried using https://github.com/ArtifexSoftware/mupdf.js/blob/master/examples/simple-viewer/index.html and modified the input element to accept epub files too. But when an epub file is opened, I get this error from console:

Uncaught (in promise) Error: cannot find document handler for file type: 'alice-in-wonderland.epub'
    5735545 http://0.0.0.0:8000/dist/mupdf-wasm.js:995
    runEmAsmFunction http://0.0.0.0:8000/dist/mupdf-wasm.js:4272
    _emscripten_asm_const_int http://0.0.0.0:8000/dist/mupdf-wasm.js:4275
    invoke_vi http://0.0.0.0:8000/dist/mupdf-wasm.js:5715
    libmupdf_wasm http://0.0.0.0:8000/dist/mupdf-wasm.js:705
    openDocument http://0.0.0.0:8000/dist/mupdf.js:1184
    openFile http://0.0.0.0:8000/:22

Perhaps, I'm missing something, and I need to modify the script used to make it support epub?

Thanks!

(Sorry if it is duplicate, I searched for epub in issues and it did not bring up anything).

jamie-lemon commented 1 month ago

This is related to the file size of the WASM library and to try to keep it as small as possible, due to file size concerns, see: https://github.com/ArtifexSoftware/mupdf.js/blob/master/BUILDING.md#building - "In order to keep it as small as possible, it is built with a minimal feature set that excludes the more refined CJK fonts, PDF scripting, XPS format, and EPUB format support." I expect build commands could possibly be added to support these formats.

robinwatts commented 1 month ago

In order to support epub, we'd need to include harfbuzz into the compilation. This would be the first library to be included that uses C++, albeit in the least impactfull way possible.

I wouldn't be 100% sure that including C++ wouldn't cause problems with wasm. It might be fine, but such is the immaturity of the wasm eco-system that I wouldn't want to bet either way without trying.

julian-smith-artifex-com commented 1 month ago

PyMuPDF uses loads of C++ these days, and works fine on Pyodide, so things might work ok?

arun-mani-j commented 1 month ago

In order to keep it as small as possible, it is built with a minimal feature set that excludes the more refined CJK fonts, PDF scripting, XPS format, and EPUB format support.

I don't know much about WASM, so just out of my curiosity, is there any disadvantage of large WASM files? (Like browser support etc.) except the potential network bandwidth?

If there is none, I could try to compile MuPDF.js with epub support (or better Artifex provides us a "full-variant", like npm i mupdf-full :see_no_evil:). Because I'm using MuPDF.js in my Tauri app - network bandwidth is not an issue.

In order to support epub, we'd need to include harfbuzz into the compilation

I share the same doubt, but as someone with no experience with WASM, I will leave it to the experts :sweat_smile:.

ccxvii commented 1 month ago

Why we don't ship everything is mainly a matter of deployment size affecting bandwidth and startup time for loading and compiling the wasm bytecode to native code.

Changing the build scripts to enable other components such as XPS and EPUB should just work.

However, dealing with multiple build configurations and deployments in NPM is a nightmare I'd very much like to avoid.