danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
850 stars 118 forks source link

Can't remove cookie notices #382

Open jaafonso opened 2 months ago

jaafonso commented 2 months ago

I captured the page below, but the cookies warning takes all the page, which prevents me from reading the contents.

https://www.nature.com/articles/d41586-023-01690-x

I could not find an option to not save the cookie warnings.

I'm using Vivaldi for Windows and enables all cookie notices filters in uBlock Origin before loading the page in the browser.

danny0838 commented 2 months ago

Please provide the capture options you use (copy from Capture as > Advanced).

jaafonso commented 2 months ago

Please see the information you requested below. I'm using the default options.

If I disable the adblock filters, accept the cookies manually and enable adblock again, the cookies notice doesn't appear when I make a new capture for this site, but this is not ideal, especially if it also may occur with other sites.

{
 "tasks": [
  {
   "comment": "",
   "tabId": 299064096,
   "title": "Daily briefing: Visual clutter skews our time perception",
   "url": "https://www.nature.com/articles/d41586-024-01202-5"
  }
 ],
 "bookId": null,
 "parentId": "root",
 "index": null,
 "mode": "",
 "delay": null,
 "options": {
  "capture.applet": "blank",
  "capture.audio": "save",
  "capture.backupForRecapture": true,
  "capture.base": "blank",
  "capture.canvas": "save",
  "capture.contentSecurityPolicy": "remove",
  "capture.deleteErasedOnCapture": true,
  "capture.deleteErasedOnSave": false,
  "capture.downLink.doc.delay": null,
  "capture.downLink.doc.depth": null,
  "capture.downLink.doc.mode": "source",
  "capture.downLink.doc.urlFilter": "",
  "capture.downLink.file.extFilter": "###image\n#bmp, gif, ico, jpg, jpeg, jpe, jp2, png, tif, tiff, svg\n###audio\n#aac, ape, flac, mid, midi, mp3, ogg, oga, ra, ram, rm, rmx, wav, wma\n###video\n#avc, avi, flv, mkv, mov, mpg, mpeg, mp4, wmv\n###archive\n#zip, rar, jar, bz2, gz, tar, rpm, 7z, 7zip, xz, jar, xpi, lzh, lha, lzma\n#/z[0-9]{2}|r[0-9]{2}/\n###document\n#pdf, doc, docx, xls, xlsx, ppt, pptx, odt, ods, odp, odg, odf, rtf, txt, csv\n###executable\n#exe, msi, dmg, bin, xpi, iso\n###any non-web-page\n#/(?!$|html?|xht(ml)?|php|py|pl|aspx?|cgi|jsp)(.*)/i",
  "capture.downLink.file.mode": "none",
  "capture.downLink.urlExtra": "",
  "capture.downLink.urlFilter": "###skip common logout URL\n/[/=]logout\\b/i",
  "capture.downloadRetryCount": 3,
  "capture.downloadRetryDelay": 1000,
  "capture.downloadWorkers": 4,
  "capture.embed": "blank",
  "capture.favicon": "save",
  "capture.faviconAttrs": "",
  "capture.font": "save-used",
  "capture.formStatus": "keep",
  "capture.frame": "save",
  "capture.frameRename": true,
  "capture.helpers": "",
  "capture.helpersEnabled": false,
  "capture.image": "save",
  "capture.imageBackground": "save-used",
  "capture.insertInfoBar": false,
  "capture.linkUnsavedUri": false,
  "capture.mergeCssResources": true,
  "capture.noscript": "save",
  "capture.object": "blank",
  "capture.ping": "blank",
  "capture.prefetch": "remove",
  "capture.preload": "remove",
  "capture.prettyPrint": false,
  "capture.recordDocumentMeta": true,
  "capture.recordRewrites": false,
  "capture.referrerPolicy": "",
  "capture.referrerSpoofSource": false,
  "capture.remoteTabDelay": null,
  "capture.removeHidden": "none",
  "capture.resourceSizeLimit": null,
  "capture.rewriteCss": "url",
  "capture.saveAs": "folder",
  "capture.saveAsciiFilename": false,
  "capture.saveDataUriAsFile": true,
  "capture.saveDataUriAsSrcdoc": true,
  "capture.saveFileAsHtml": false,
  "capture.saveFilename": "%id%",
  "capture.saveFilenameMaxLenUtf16": 120,
  "capture.saveFilenameMaxLenUtf8": 240,
  "capture.saveFolder": "WebScrapBook/data",
  "capture.saveOverwrite": false,
  "capture.saveResourcesSequentially": false,
  "capture.saveTo": "folder",
  "capture.script": "remove",
  "capture.serverUploadRetryCount": 3,
  "capture.serverUploadRetryDelay": 2000,
  "capture.serverUploadWorkers": 4,
  "capture.shadowDom": "save",
  "capture.style": "save",
  "capture.styleInline": "save",
  "capture.video": "save",
  "capture.zipCompressLevel": null
 }
}
danny0838 commented 2 months ago

As the known issues page has documented: "Internal stylesheets of another browser extension cannot be captured. This could cause an issue like ads hidden by an ad-blocker extension be still visible in the captured page."

The ad blocker only HIDEs the cookie wanings, rather than REMOVE them (by injecting an extension stylesheet, which can never be captured by WSB), and that's why you always get them if you haven't really clicked accept manually.

To solve the issue you have to really do the clicking manually, or switch to another browset extension that really removes the cookie wanings for you. Another approach is setting up a capture helper to do it, but it's largely site-dependant and would thus be somehow impractical.