ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
329 stars 52 forks source link

Add bindings to support Wasm WASI. #159

Open ajalt opened 2 weeks ago

ajalt commented 2 weeks ago

Hi, thanks for making this great library!

I'm trying to use this library on the wam32-wasip1 target to run on a standalone Wasm runtime like Wasmtime. I can compile against that target, but it looks like all the code in src/wasm.js depends on the JS shims generated by wasm_bindgen, which aren't available on WASI.

Assuming I already have a WASI-compiled pdfium library, do you have an idea of what it would take to link pdfium-render to it? Is that even possible, or would it require changes in pdfium-render?

ajrcarey commented 2 weeks ago

Hi @ajalt , thank you for raising the issue. At the time I implemented the WASM bindings, I did look into supporting WASI as well but it was never clear to me (this was a few years ago) how to connect two independent WASM modules together within wasmtime. Since that is essential for working with Pdfium, and since binding two modules together in the browser is (relatively) straight-forward using wasm_bindgen, I focussed on the browser-based implementation.

I would be very happy to revisit this as I would like to support wasmtime as well. But I would need some pointers as to how to get two independently-compiled WASM modules to talk to each other in wasmtime.

ajalt commented 2 weeks ago

Emscripten has some docs on dynamic linking. One solution they mention is to include the library in the wasm filesystem, then link to it using regular dlopen.

Another less elegant option if that doesn't work might be to link manually by loading pdfium and pdfium-render in separate wasm instances, and importing all the pdfium exports explicitly when setting up the wasm engine:

let pdfium_render_import = imports! {
    "env" => {
        "PDFium_Init" => pdfium_library_exports.get_function("PDFium_Init")?,
        "FPDF_InitLibraryWithConfig" => pdfium_library_exports.get_function("FPDF_InitLibraryWithConfig")?,
        // etc...
    }
};
ajrcarey commented 2 weeks ago

If dlopen is indeed available, then that suggests that pdfium-render's default dynamic bindings should be workable. But how are you proposing to build Pdfium? Those Emscripten docs suggest that each module to be linked needs to be built with specific command-line parameters. Are you planning to build Pdfium yourself?

How were you envisioning this would work at runtime? Were you planning on having a Rust wrapper that would perform the module linking for you, a la https://docs.wasmtime.dev/examples-rust-linking.html, or did you intend to load the modules directly into wasmtime from the command line? How would that work?

I think I would need to see a minimal example of two modules linked together - doesn't need to be anything to do with Pdfium, just any two demo modules where module1::main() calls a test function exported by module2 - in order to proceed with any large scale Pdfium bindings implementation.

ajalt commented 2 weeks ago

Yes, I compiled pdfium against WASI myself. It just requires some extra command line flags vs wasm-js.

If you clone the pdfium-lib project, this patch should get it to compile against WASI (just follow their regular wasm build instructions after patching):

diff --git a/modules/wasm.py b/modules/wasm.py
--- a/modules/wasm.py
+++ b/modules/wasm.py
@@ -657,6 +657,11 @@ def run_task_generate():
                 "ASSERTIONS=1",
                 "-s",
                 "ALLOW_MEMORY_GROWTH=1",
+                "-s",
+                "STANDALONE_WASM=1",
+                "-sWASM_ASYNC_COMPILATION=0",
+                "-sWARN_ON_UNDEFINED_SYMBOLS=1",
+                "-sERROR_ON_UNDEFINED_SYMBOLS=0",
                 "-sMODULARIZE",
                 "-sEXPORT_NAME=PDFiumModule",
                 "-std=c++11",

Then you should be able to make a rust cdylib crate compiled against wasm-wasi that imports those functions, for example:

extern "C" {
    fn FPDF_InitLibrary();
}

#[no_mangle]
pub fn main() {
    unsafe {
        FPDF_InitLibrary();
    }
}

Then you should be able to use the Linker from the wasmtime docs you posted to run them together.


I imagine Emscripten's dlopen support involves baking the loading into the binary, so that's probably not available from regular Rust. It doesn't look like the libloading crate supports wasm, for example.

ajrcarey commented 2 weeks ago

Many thanks. And does that Linker example from the wasmtime docs itself get compiled to wasm, or is it built as a standard Rust executable? I feel like I need an ELI5 of what the end-to-end flow is meant to be for this so I can understand (a) the motivation and (b) how it could work with pdfium-render.

ajalt commented 2 weeks ago

The linker example is compiled to a regular rust binary: wasmtime is the engine/runtime for the wasm code you've compiled.

basically you would compile three things:

  1. Use emscripten to compile the pdfium c library to wasm WASI
  2. Use cargo to compile a rust program that uses pdfium-render to work with a pdf against --target wasm-wasip1
  3. Use cargo to compile a second rust program that uses wasmtime to link and run the two wasm modules, compiled to any native target

Step 3 doesn't need to be in rust; you could use any other wasmtime bindings like python or go.

ajrcarey commented 1 week ago

Alright, that broadly makes sense to me.

If you would like me to lead the work on this, then you should prepare to wait; this would be a backlog item that would not start until after 0.9.0.

If you want to make a start on it yourself, I would be very happy to support you. Once it's obvious to me how to link the modules together, and what the calling mechanism would be on the pdfium-render side, I can probably pretty easily flesh out all the bindings. But a worked example from someone who is actually motivated to drive the feature would be very helpful.