andytango / mupdf-js

πŸ“° Yet another Webassembly PDF renderer for node and the browser
https://andytango.github.io/mupdf-js-demo/index.html
GNU Affero General Public License v3.0
176 stars 20 forks source link
mupdf pdf pdf-converter pdf-viewer wasm webassembly

πŸ“° MuPDF.js

πŸ“’ IMPORTANT NOTE

There is now an official package supported by the MuPDF developers, which is likely to be more up-to-date and better maintained.

πŸ‘‰ You can find it here.

I recommend migrating to the official package, as I am no longer maintaining this one due to time constraints.

To nudge you in the right direction, I've pushed a major version of this package that is a stub and will throw an error if you try to use it.

If you really want to use this package, (or if you need it to run legacy code that depends on it, and you can't update that code for some reason) you can still install it by manually installing version 1.1.1.

npm npm [GitHub Workflow Status]

This is a port of MuPDF to javascript and webassembly, giving you the following:

🏁 Getting Started

yarn add mupdf-js
# or
npm i mupdf-js

Basic Usage

Before you do any processing, you'll need to initialise the MuPdf library:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file: File) {
  const mupdf = await createMuPdf();

  //...
}

In the browser, you'll most likely retrieve a File or Blob object from an html <input type="file"> tag, supplied by a user.

You'll need to convert the file firstly to an ArrayBuffer, then to a Uint8Array:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);

  //...
}

Once you have this, you can load the file into the MuPdf environment, creating a MuPdf document:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  const doc = mupdf.load(arrayBuf);
}

You now have three different options to render the PDF document:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  const doc = mupdf.load(arrayBuf);

  // Each of these returns a string:

  const png = mupdf.drawPageAsPNG(doc, 1, 300);
  const svg = mupdf.drawPageAsSVG(doc, 1);
  const html = mupdf.drawPageAsHTML(doc, 1);
}

Conversion Options

PNG

mupdf.drawPageAsPNG(document, page, resolution);

Arguments:

Returns: an uncompressed PNG image, encoded as a base64 data URI.

SVG

mupdf.drawPageAsSVG(document, page);

Arguments:

Returns: an SVG file with the PDF document rendered as image tiles.

HTML

mupdf.drawPageAsHTML(document, page);

Arguments:

Returns: an HTML file that uses absolute positioned elements for layout.

Text operations

Get text from page

mupdf.getPageText(document, page);

Arguments:

Returns: string containing all text collected from page

Search on the page

mupdf.searchPageText(document, page, searchString, maxHits);

Arguments:

Returns: array of found rectangles of text matches ({x: number, y: number, w: number, h: number}[])

You should set maxHits to an appropriate level that a user would expect (for example 100), or allow users to set their own limit. Alternatively, if you want to allow effectively unlimited search hits (and risk running out of memory), you can set it to C's maximum unsigned 32-bit integer size, which is 4294967295.

Contributing

See CONTRIBUTING.md

License

AGPL, subject to the MuPDF license.