bblanchon / pdfium-binaries

📰 Binary distribution of PDFium
789 stars 166 forks source link

PDFIUM is not working in Webkit(playwright) #158

Open KameshRajendran opened 2 months ago

KameshRajendran commented 2 months ago

We are encountering a problem with the pdfium module in WebKit during Playwright testing. The provided sample code is designed to inform the client side once both the document and pdfium.wasm has been successfully loaded. This code ensures that both the pdfium module and the DOM are fully loaded before proceeding.

<!DOCTYPE html>
<html lang="en">
<head>
    <script src="pdfium.js"></script>

    <script>
        window.createDivElement = function (value) {
            var divElement = document.createElement("div");
            divElement.textContent = value;
            divElement.style = "Left:20px;Width: 300px; Height:50px"
            document.body.appendChild(divElement);
        }
        document.addEventListener("DOMContentLoaded", function () {  
            let pageLoaded = false;
            let moduleLoaded = false;
            window.createDivElement("Start DOM Loading...");
            // Module. onRuntimeInitialized will be called from pdfium.js to notify the user for further process
            Module.onRuntimeInitialized = async _ => {
                moduleLoaded = true;
                window.createDivElement("PDFIUM Module Loaded...");
                checkIfEverythingWasLoaded();
            };

            function checkIfEverythingWasLoaded() {
                if (pageLoaded && moduleLoaded) {
                    window.createDivElement("Both Page and M    odule loaded...");
                }
            }
            window.onload = function (e) {
                window.createDivElement("Page Loaded...");
                pageLoaded = true;
                checkIfEverythingWasLoaded();
            }
        });
    </script>
</head>

<body>
    @RenderBody()
    <script src="_framework/blazor.server.js"></script>
</body>
</html>

Please find the blazor sample: WEBKIT~1739968950.zip

The table below illustrates the outcomes of executing the aforementioned code in both Chrome and WebKit. In the case of WebKit (Playwright), the Module.onRuntimeInitialized event fails to trigger, preventing us from proceeding with subsequent steps to read the document.

chrome vs webkit

For image reading, we utilized pdfium.wasm, loading only the pdfium.js file into the application. The pdfium.js file, in turn, loads the pdfium.wasm file independently and notifies the success handler for further processing. This mechanism functions correctly in major browsers such as Chrome, Edge, Firefox, and Safari. However, it encounters an issue in the webkit environment.

Upon closer examination, we found that within the pdfium.js file, WebAssembly.instantiateStreaming is employed to read the .wasm file. However, in Safari, this method fails to return either a success or failure handler.

Wekit-pdfium-issue

Can anyone redirect us if you have any idea on this?

Note :

We used the below comment to run the application in WebKit with Node version v16.20.1

npx playwright install npx playwright install webkit (If Needed) npx playwright wk http://localhost:7185/

KameshRajendran commented 2 months ago

https://bugs.chromium.org/p/pdfium/issues/detail?id=2134

bblanchon commented 2 months ago

Ping, @jerbob92.

KameshRajendran commented 2 months ago

@jerbob92 - Could you please help on this?

jerbob92 commented 2 months ago

@bblanchon @KameshRajendran I don't use the browser build of pdfium so I can't help out here

jerbob92 commented 2 months ago

I couldn't help myself and was interested in what Playwright is, but 2 minutes of searching told me that Playwright does not support Webassembly: https://github.com/microsoft/playwright/issues/2876 https://github.com/microsoft/playwright/issues/14536

Are you sure this is supposed to work?

Edit: looks like Playwright uses WebKit 17.4, which does not have the Webassembly support on Windows.

KameshRajendran commented 2 months ago

@jerbob92 , Thanks for your update and interest on our issue using web assembly version of pdfium.

I believed that WebKit lacked support solely for Blazor WebAssembly. However, our understanding was that WebKit would fully support native WebAssembly implementation, as announced in their official blog post in 2017 (https://webkit.org/blog/7691/webassembly/).

On Mac, Safari utilizes WebKit, and in this environment, we can successfully load native WebAssembly components (pdfium.wasm and pdfium.js). I have a suspicion that the issue lies with our pdfium.wasm when attempting to load it only with WebKit on Playwright.

Can you review the provided example and the steps I've outlined to replicate the problem?

GokulprasathVenkatachalam commented 2 months ago

@jerbob92, I can successfully load native wasm files and retrieve information from them. However, the problem arises only when attempting to load or retrieve information from pdfium.wasm and pdfium.js using WebKit on Playwright.

We have the below sample with the GitHub folder instantiate-streaming

We have opened the below sample from the above GitHub source and we can able to load and get the information from Webkit.

https://mdn.github.io/webassembly-examples/js-api-examples/instantiate-streaming.html

Please find the screen shot for above sample in webkit. image

jerbob92 commented 2 months ago

@KameshRajendran I think the issue on both Playwright and WebKit itself are pretty clear that WebAssembly was not enabled on Windows, so it make sense that it would work on Safari on Mac. I don't have a Windows machine myself and also don't really have time to test this out for you.

It is quite unclear to me what version Playwright actually uses so it might be that they are actually using a version that has that merged in.

It could also be something inside the WebAssembly that is not supported, the WebAssembly example that you linked is very simple if you compare it to pdfium.

I'd suggest trying to get more information yourself on why the loading fails.