gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
15.57k stars 1.01k forks source link

Namespaced elements of XHTML and CSS are not correctly handled #1513

Open danny0838 opened 3 months ago

danny0838 commented 3 months ago

Describe the bug Namespaced elements of XHTML and CSS are not correctly handled.

To Reproduce Steps to reproduce the behavior:

  1. Create the test page and CSS somewhere:
    • element.xhtml
      <!DOCTYPE html>
      <html xmlns="http://www.w3.org/1999/xhtml"
          xmlns:myns="http://example.com/myns">
      <head>
      <meta charset="UTF-8" />
      <style>
      @namespace myns url("http://example.com/myns");
      myns|elem-1 { background-color: lime; }
      </style>
      <style>
      @namespace url("http://example.com/myns");
      elem-2 { background-color: lime; }
      </style>
      <style>
      @namespace myns url("http://example.com/myns");
      myns|elem-3 { background-color: lime; }
      </style>
      <style>
      @namespace url("http://example.com/myns");
      elem-4 { background-color: lime; }
      </style>
      <style>
      @import "./element-import.css";
      elem-6 { background-color: lime; }
      </style>
      </head>
      <body>
      <blockquote>
      <myns:elem-1>myns:elem-1</myns:elem-1>
      </blockquote>
      <blockquote>
      <myns:elem-2>myns:elem-2</myns:elem-2>
      </blockquote>
      <blockquote>
      <elem-3 xmlns="http://example.com/myns">elem-3</elem-3>
      </blockquote>
      <blockquote>
      <elem-4 xmlns="http://example.com/myns">elem-4</elem-4>
      </blockquote>
      <blockquote>
      <elem-5 xmlns="http://example.com/myns">elem-5</elem-5>
      </blockquote>
      <blockquote>
      <elem-6>elem-6</elem-6>
      </blockquote>
      </body>
      </html>
    • element-import.css
      @namespace url("http://example.com/myns");
      elem-5 { background-color: lime; }
  2. Save the page with default options
  3. Open the saved page

Expected behavior

Environment

gildas-lormeau commented 3 months ago

Thank you very much for the detailed report!

It looks like the proper way to fix this issue would be to always use a separate <style> element for each @import instead of inlining their content. I'll have to think about it... Out of curiosity, how do you handle this problem in webscrapbook?

danny0838 commented 3 months ago

WebScrapBook normally saves each @import as a separate file, and thus doesn't natively has the issue.

When saving as single HTML, WebScrapBook recursively converts the <link> and @import content into a data URL (need some special care about circular importing). This is ugly and not volume efficient but such stupid fidelity can prevent many potential issues introduced by converting a <link> or @import into <style>, for example:

  1. When the @import contains a media, layer, or support query.
  2. When the @import appears after another CSS rule (which makes it invalid by spec and ignored by most modern browsers).
  3. When the <link> is an alternative stylesheet and may have been switched by the browser (e.g. Firefox supports switching alternative stylesheet).

P.S.: As a developer I really hate all the CSS related things...

gildas-lormeau commented 3 months ago

Sorry, I had forgotten that webscrapbook saves resources separately. In fact, I do more or less the same thing (I guess) in SingleFile when saving the page in self-extracting format. The only major difference is that I use Blob URIs.

I'm still studying on this problem and I can confirm that it's probably going to take me a while to fix it because of the refactoring. Anyway, I think it should be doable. For now, I'm just resting my brain to get back to it later.

I completely agree with you about CSS. For example, I'm not a fan of the fact that the URL passed to the FontFace constructor is inaccessible via the DOM. More recently, I've (re-)discovered CSS Worklets (see also the paint function). It seems that some sites are starting to use them (see https://github.com/gildas-lormeau/SingleFile/issues/1502). I'd also like to avoid having to use a third-party library to parse the CSS. And let's not even talk about constructed (and adopted) stylesheets... It was tense enough with the Shadow DOM, I hope Houdini APIs won't make our extensions obsolete.

danny0838 commented 3 months ago

It is unfortunate that there is no available standard-compliant CSS parser, making many things difficult to handle. Although browsers support CSSOM, it's not enough to retrieve details of a apecific cssText in a rule, such as referenced CSS variables.

The closest one so far is parse-css. Unfortunately it's still buggy and needs improvement, and the maitainer still haven't review my patch PRs.