gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
15.3k stars 1k forks source link

`np-coburg.de`: slideshow images not saved #1533

Closed mara004 closed 1 month ago

mara004 commented 1 month ago

Describe the bug

When saving pages from np-coburg.de, image slideshows are broken. Only the first image is included, any others are lost.

I have read the FAQs "Why aren't images saved on [some sites]?" and "Why don't interactive elements [...] work properly in saved pages?", but checking the mentioned options does not help. In fact, this makes all images disappear, even the first.

To Reproduce

  1. Go to any article with slideshow from np-coburg.de, e.g. https://www.np-coburg.de/inhalt.lage-im-ueberblick-brand-im-kuehlsystem-von-akw-saporischschja-geloescht.1088cf39-0e92-41d1-bd9d-3d1d9856340c.html
  2. Save with SingleFile by clicking its icon
  3. View the output. It will shows only image 1; slideshow controls and subsequent images are missing (in this case, images 2 to 4).

Expected behavior All images should be included; slideshow should work.

Screenshots Online page Saved page
image image
... (not navigatable)

Environment

Additional context archive.org and archive.ph also fail to save the slideshows: https://archive.ph/2kbOf https://web.archive.org/web/20240812175851/https://www.np-coburg.de/inhalt.lage-im-ueberblick-brand-im-kuehlsystem-von-akw-saporischschja-geloescht.1088cf39-0e92-41d1-bd9d-3d1d9856340c.html

gildas-lormeau commented 1 month ago

I did a test and I confirm SingleFile cannot save the images because they are retrieved dynamically after the page is loaded. You can circumvent the issue by disabling the option HTML Content > set content security policy (and following the advises in the FAQ). However, the images won't be visible when viewing the page offline, or if they disappear from the website one day. I attached an example of saved page.

Ukraine-Krieg_ Putin will endlich Ruhe an neuer Front von Kursk - Politik - Neue Presse Coburg (8_12_2024 8_19_00 PM).html.zip

mara004 commented 1 month ago

I see, thanks for the quick reply.

Unfortunately that's not too much use to me, as what I wish to do is archive standalone, offline snapshots (of some of the paid articles, actually). So I guess I'll have to download the additional images manually and store them in an accompanying directory (duh).

Thanks for this nice tool anyway; it works well on many other websites that don't do this dynamic loading.

gildas-lormeau commented 1 month ago

I'm glad to hear it works sometimes ;)

On paper, the WARC format is more suited to your needs. This format is designed to store all network exchanges. FYI, you can find a list of tools to handle WARC files here: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem

mara004 commented 1 month ago

Thanks for the link, I'll give WARC a try. Do you have a recommendation which of these tools I should use? However, seeing as the archive sites failed too, I'm not sure whether I'll truly have more luck with WARC? Also, if I end up with some command-line tool, how will this play with the paywalls?

gildas-lormeau commented 1 month ago

It would be difficult for me to give you more information, as I haven't had the time or opportunity to test the tools.