dealfonso / pdfjs-viewer

A viewer based on PDFjs, which can be embedded in any web page (not using iframes)
Apache License 2.0
48 stars 1 forks source link

Adding Find to the viewer #6

Closed bentalentville closed 3 months ago

bentalentville commented 8 months ago

I figured out how to integrate your simple viewer into my website, but I am wondering how I would extend it to add a Find function. I believe the pdfjs has a built in find controller (I used it with the old 2.x pdfjs version with my simpleviewer implementation), but I am wondering how I would use that via your viewer as you did not add that to the js code.

dealfonso commented 7 months ago

Hi,

the library does not include any utility for searching, but it is easy to implement (as it does the find controller):

Assuming that you have something like this: let pdfViewer = new PDFjsViewer(...), it is possible to extract the text from the pages and search it to identify in which page(s) is found the text:

for (iPage = 0; iPage < pdfViewer.pdf.numPages; iPage++) {
    let currentPage = await pdfViewer.pdf.getPage(iPage + 1);
    let currentText = await currentPage.getTextContent();
    let buffer = [];
    for (let item of currentText.items) {
        buffer.push(item.str);
        if (item.hasEOL) { buffer.push("\n"); }
    }
    let textInThePage = buffer.join("");

    /** search the text in the page */
}
bentalentville commented 7 months ago

I guess knowing if text is on a page is one thing (to be able to bring up a page on which the text appears), showing it highlighted on-screen is another. Early versions of pdfjs (which I have been using) had an overlayed text layer that sat on top of the canvas, perhaps V3.x that I am using doesn't do that any more, as I cannot click and select anything on the screen (it is just a canvas without anything overlaid). Not a critical function perhaps.

One other question for you: Right now the text displayed is pretty fuzzy, just not great resolution, and not as good as the pdfjs 1.x I have been using. This might be out of the basic context of your viewer, but is there a quick tweak I can make to the _renderPage function in the viewer JS to make the text sharper, even if it adds to memory usage? I am using the CDN-based pdfjs code, so I can't just tweak that code. Is it the renderContext object that can be changed to be similar to what is in the pdfjs viewer file?

dealfonso commented 7 months ago

Hi, you are right. The resolution was poor and now I have included a parameter renderingScale for the viewer. It defaults to 1.5 so that it gets a better resolution.

Please try the latest version in the CDN, as it should include the updates.

(*) requesting version 1.1 does not work for me now... maybe the cache of the CDN will update soon. In the meanwhile you can try the specific version 1.1.2: https://cdn.jsdelivr.net/gh/dealfonso/pdfjs-viewer@1.1.2/pdfjs-viewer.js