gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
14.29k stars 943 forks source link

Add option for space saving without deleting newlines #1440

Closed runxel closed 2 months ago

runxel commented 2 months ago

I use SingleFile not only for longterm storage, but also to feed beautiful soup and/or looking and manipulating code. VSCode has trouble with it tho and stops tokenization when the lines get too long. Since the minimization option also gets rid of all newlines the HTML file will consist of a single line. I've experienced degraded performance because of that on several occasions.

We need an option for minimizing the HMTL but without nuking newlines = preserving them.

gildas-lormeau commented 2 months ago

SingleFile should not remove newlines in HTML. For example, if you save https://example.com/ with SingleFile, the resulting content still contains the original newlines, see the source code below.

<!DOCTYPE html> <html><!--
 Page saved with SingleFile 
 url: https://example.com/ 
 saved date: Tue Apr 30 2024 01:11:30 GMT+0200 (Central European Summer Time)
--><meta charset=utf-8>
<title>Example Domain</title>
<meta name=viewport content="width=device-width, initial-scale=1">
<style>body{background-color:#f0f0f2;margin:0;padding:0;font-family:-apple-system,system-ui,BlinkMacSystemFont,"Segoe UI","Open Sans","Helvetica Neue",Helvetica,Arial,sans-serif}div{width:600px;margin:5em auto;padding:2em;background-color:#fdfdff;border-radius:0.5em;box-shadow:2px 3px 7px 2px rgba(0,0,0,0.02)}a:link,a:visited{color:#38488f;text-decoration:none}@media (max-width:700px){div{margin:0 auto;width:auto}}</style>
<meta name=referrer content=no-referrer><link rel=canonical href=https://example.com/><meta http-equiv=content-security-policy content="default-src 'none'; font-src 'self' data:; img-src 'self' data:; style-src 'unsafe-inline'; media-src 'self' data:; script-src 'unsafe-inline' data:; object-src 'self' data:; frame-src 'self' data:;"><style>img[src="data:,"],source[src="data:,"]{display:none!important}</style></head>
<body>
<div>
 <h1>Example Domain</h1>
 <p>This domain is for use in illustrative examples in documents. You may use this
 domain in literature without prior coordination or asking for permission.</p>
 <p><a href=https://www.iana.org/domains/example>More information...</a></p>
</div>

However, this problem arises for the content of inline stylesheets but unfortunately I can't fix it (easily). It's an issue in the library used by SingleFile to parse CSS (see https://github.com/csstree/csstree/issues/237).

gildas-lormeau commented 2 months ago

I'm closing the issue because it's actually a duplicate of https://github.com/gildas-lormeau/SingleFile/issues/1220.

gildas-lormeau commented 2 months ago

Note that if you confirm the issue is mainly related to CSS content, I could add an option to add newlines after each CSS rule to circumvent it.