gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
537 stars 57 forks source link

Save selection feature #82

Open patchhg opened 2 months ago

patchhg commented 2 months ago

First of all, thank you so much for all your great work on SingleFile over the years. It is such a great tool for storing technical articles to my knowledge base rather than having a million links that randomly go offline or saving screenshots which contain code snippets. Some of the articles I've saved are no longer available on wayback machine even.

I need to save a bunch of articles from medium and was hoping there was a way I could automate this using singlefile CLI. Unfortunately, I can't find any CLI arguments to save a page selection in the same way the SingleFile extension works.

Do you have any suggestions for implementing this? I was thinking of having a script after page load which manually removes the unnecessary DOM elements but this is not ideal. Is there a better way to emulate the behavior of the browser extension?

In case you deem it relevant, perhaps this might also be a good idea for a feature request.

gildas-lormeau commented 2 months ago

You can run a script when saving a page with the --browser-script. The problem, particularly on Medium, is that writing such a script in a reliable way may be complicated. For example, on Medium, all the elements have minified class names (see HTML below) that are not guaranteed to be constant.

...
<div class="ui t uj v uk ul um un uo up ab q cn fi">
    <div class="uq ur us ut uu l">
        <div class="am l fr uv uw">
            <div class="h k">
                <div class="l fi n ux">
                    <button class="af ag ah ai aj ak al am an ao ap aq ar as at ab" data-testid="close-button" aria-label="close">
                    ...
                </div>
            </div>
        </div>
    </div>
</div>
...

If by any chance you're interested in removing the bottom banner, you could run ./single-file --browser-script=medium-script.js "https://medium.com/..." with the script below. Note that it tries to work in a generic way, without relying on class names.

medium-script.js

onload = () => {
    const elements = document.querySelectorAll("*");
    elements.forEach(element => {
        if (!element.ariaHidden) {
            const style = getComputedStyle(element);
            if (style.position == "fixed") {
                element.remove();
            }
        }
    });
};