[BUG] Data URL has limits in Chrome for PDFs

m-schubert commented 4 years ago

Describe the bug It looks like when trying to embed a PDF into a tiddler, we run into problems with PDFs greater than 2MB. The PDF tiddlers create and save fine, but the embedding of the PDF viewer fails when trying to view them.

I've traced this down to the usage of data URLs. Supposedly Chrome has a 2MB limit, although it seems to ignore this for images.

To Reproduce Steps to reproduce the behavior:

Drag and drop a PDF greater than 2MB onto a TiddlyWiki
Import the PDF
Attempt to view the PDF. All you should see is a blank tiddler

Expected behavior I'd expect to see the PDF.

Desktop (please complete the following information):

OS: Arch Linux
Browser: Chromium 81

A fix I can work around the problem by editing the $:/core/modules/parsers/pdfparser.js shadow tiddler, and changing the code to use URL.createObjectURL instead.

var ImageParser = function(type,text,options) {
    var element = {
            type: "element",
            tag: "embed",
            attributes: {}
        },
        src;
    if(options._canonical_uri) {
        element.attributes.src = {type: "string", value: options._canonical_uri};
    } else if(text) {
        var array = Uint8Array.from(atob(text), c => c.charCodeAt(0));
        var blob = new Blob([array], {type: 'application/pdf'});
        var uri = URL.createObjectURL(blob);
        element.attributes.src = {type: "string", value: uri};
    }
    this.tree = [element];
};

exports["application/pdf"] = ImageParser;

})();

caniuse.com indicates that URL.createObjectURL works for all recent browsers besides Opera Mini. I also notice it has been used in the download saving code if it's available.

I'm not a great JS coder, but I might be able to create a PR to generate src strings based on either URL.createObjectURL if it's available, but falling back to data URLs. Or is it worth just accepting the fact that URL.createObjectURL is a standard HTML5 feature and switching parsers across to use that?

pmario commented 4 years ago

please see: https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL

the object lifetime is bound to the document. So if you don't need it anymore you need to revoke it. Otherwise it can create memory leaks

m-schubert commented 4 years ago

please see: https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL

the object lifetime is bound to the document. So if you don't need it anymore you need to revoke it. Otherwise it can create memory leaks

Yep. I think that's OK, though. The parser is only called once due to the tiddler cache, so only one object URL is created. I don't think there is any safe time to revoke the object URL, given someone could reopen the tiddler at any time?

At this point I'm most interested in advice on how I might structure a PR. Where else are data URLs used that we might be able to also make object URLs?

kigun-org commented 1 week ago

Thank you m-schubert, I ran into the same issue and used your code above in my local installation and it works great. I've submitted a PR just to get the ball rolling, as I see there hasn't been any activity on this in 4 years. I hope you don't mind.

Jermolene commented 1 week ago

Hi @kigun-org @m-schubert this solution does not work when generating static HTML renderings of tiddlers. The problem is that these tiddlers are saved as plain text, and the blob URLs will be broken.

I think the cleanest solution is to store the PDFs as external tiddlers, is that feasible for you?

kigun-org commented 1 week ago

Hi @Jermolene, thank you for the prompt reply. I do not know enough about the code to realize it would break static rendering, sorry about that.

My use case is having a group of students write content and upload relevant PDFs/images, so having them drag and drop files directly into the tiddlywiki is by far the most frictionless way for them to contribute. Adding a separate upload interface and explaining concepts like _content_uri seems a bit too complex (unless there's a plug-in to streamline that?).

Would having a flag enabling this code path only for web rendering be a possibility? Again, I don't know enough about the code to know if it's feasible/maintainable, but I'm happy to help out if you can point me in the right direction. Otherwise, I'll just patch the change in for my installation.

Thank you for all your work on Tiddlywiki, I think it's a great project.

m-schubert commented 1 week ago

@kigun-org, I'm glad the code snippet helped, and thanks for opening the PR.

@Jermolene, I have similar concerns to @kigun-org. I was using the NodeJS backend, and having to host a separate service to upload larger files to with the added complication of linking to them just feels a bit clunky.

Some questions:

Are the data URLs used directly for static HTML rendering, or is the data extracted and saved as a separate file? I'm guessing the former is the easy solution.
How does the static HTML rendering handle external tiddlers? Are they downloaded and cached, or is a link just maintained to them even for the statically rendered site?

Jermolene commented 1 week ago

Hi @m-schubert

Are the data URLs used directly for static HTML rendering, or is the data extracted and saved as a separate file? I'm guessing the former is the easy solution.

Ordinary, embedded images appear as data URLs in the static HTML rendering. You can see some example at https://tiddlywiki.com/static.html

How does the static HTML rendering handle external tiddlers? Are they downloaded and cached, or is a link just maintained to them even for the statically rendered site?

Images with a _canonical_uri appear as external image links in the static HTML rendering.

Jermolene / TiddlyWiki5

[BUG] Data URL has limits in Chrome for PDFs #4575