WebMemex / webmemex-extension

📇 Your digital memory extension, as a browser extension
https://webmemex.org
Other
208 stars 45 forks source link

Keep <noscript> when appropriate (was: Images not in snapshots from Medium.com) #134

Closed dvn0 closed 6 years ago

dvn0 commented 6 years ago

Images in articles from Medium are not showing up in my snapshots.

This page for example.

To reproduce:

  1. noscript extension on, blocking all JS.
  2. Snapshot the page with WebMemex
Treora commented 6 years ago

Thanks for reporting; there seem to be more issues with images, which I have not yet understood. I am currently rewriting freeze-dry, the code that does the snapshotting, so I will try to fix this issue there.

Treora commented 6 years ago

Not sure why I did not realise this before: Medium.com lazily loads each image using javascript, and also adds the image in a <noscript> tag to show it when scripts are disabled. But freeze-dry removes all <noscript> tags, under the assumption that scripts have been executed, e.g. in order to not to end up with both images in this case. So the solution is to keep <noscript> tags, and perhaps convert them into <div> tags, when the page was viewed with javascript disabled. See the corresponding comment in freeze-dry:

// If noscript content was not shown, we do not want it to show in the snapshot either. Also, we
// capture pages after scripts executed (presumably), so noscript content is likely undesired.
// TODO We should know whether noscript content was visible, and if so keep it in the doc.
// TODO Keep noscript content in fetched iframe docs, as scripts have not been executed there?
const noscripts = Array.from(doc.querySelectorAll('noscript'))
noscripts.forEach(element => element.parentNode.removeChild(element))

Good to have a real-world use case for this now.

Treora commented 6 years ago

I suppose it makes sense to move this issue to the freeze-dry repo: #32 there.