danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
908 stars 121 forks source link

can't setup PyWebScrapbook to show saved pages. #372

Closed seregaMetbovb closed 8 months ago

seregaMetbovb commented 8 months ago
Web browser extension saves webpages properly to webscrapbook directory, they don't appear in opened webScrapBook panel.

New folder, New separator shows in panel after i created it. Command line log of wsb said that everythoing is "200 - OK". I set "no_tree = false", http://localhost:8099/. firewall is disabled; OS - Ubuntu 23.10; Firefox 123. I run wsb in python virtual environment wsb --root /home/user/Downloads/WebScrapBook serve tried without success WebScrapBook 2.8 and pywebscrapbook 2.3.0 WebScrapBook1.14.1 and Pywebscrapbook-1.16.0

sample output of wsb 127.0.0.1 - - [07/Mar/2024 19:46:55] "GET /?a=config&f=json&ts=1709826415175 HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:46:55] "GET /tree/?a=list&f=json HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:46:55] "GET /tree/toc.js HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:46:55] "GET /tree/meta.js HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:47:04] "GET /tree/?a=list&f=json HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:47:04] "GET /?a=config&f=json&ts=1709826424461 HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:47:04] "GET /tree/?a=list&f=json HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:47:04] "GET /tree/meta.js HTTP/1.1" 304 - 127.0.0.1 - - [07/Mar/2024 19:47:04] "GET /tree/toc.js HTTP/1.1" 304 - 127.0.0.1 - - [07/Mar/2024 19:48:41] "GET /tree/?a=list&f=json HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:48:41] "GET /?a=config&f=json&ts=1709826521149 HTTP/1.1" 200 - 127.0.0.1 - - [07/Mar/2024 19:48:41] "GET /tree/?a=list&f=json HTTP/1.1" 200 -

danny0838 commented 8 months ago

Please provide your capture options (can be copied from capture as => advanced) and the log mesaage in the capture dialog.

seregaMetbovb commented 8 months ago

{ "tasks": [ { "tabId": 3, "url": "https://addons.mozilla.org/en-US/firefox/addon/webscrapbook/versions/?utm_content=search&utm_medium=referral&utm_source=addons.mozilla.org", "title": "WebScrapBook version history - 25 versions – Add-ons for Firefox (en-US)", "comment": "" } ], "bookId": null, "parentId": "root", "index": null, "mode": "", "delay": null, "options": { "capture.serverUploadWorkers": 4, "capture.serverUploadRetryCount": 3, "capture.serverUploadRetryDelay": 2000, "capture.downloadWorkers": 4, "capture.downloadRetryCount": 3, "capture.downloadRetryDelay": 1000, "capture.saveTo": "folder", "capture.saveFolder": "WebScrapBook/data", "capture.saveAs": "folder", "capture.saveFilename": "%id%", "capture.saveFilenameMaxLenUtf16": 120, "capture.saveFilenameMaxLenUtf8": 240, "capture.saveAsciiFilename": false, "capture.saveOverwrite": false, "capture.saveFileAsHtml": false, "capture.saveDataUriAsFile": true, "capture.saveDataUriAsSrcdoc": true, "capture.saveResourcesSequentially": false, "capture.resourceSizeLimit": null, "capture.image": "save", "capture.imageBackground": "save-used", "capture.favicon": "save", "capture.faviconAttrs": "", "capture.canvas": "save", "capture.audio": "save", "capture.video": "save", "capture.embed": "blank", "capture.object": "blank", "capture.applet": "blank", "capture.frame": "save", "capture.frameRename": true, "capture.font": "save-used", "capture.style": "save", "capture.styleInline": "save", "capture.rewriteCss": "url", "capture.mergeCssResources": true, "capture.script": "remove", "capture.noscript": "save", "capture.contentSecurityPolicy": "remove", "capture.ping": "blank", "capture.preload": "remove", "capture.prefetch": "remove", "capture.base": "blank", "capture.formStatus": "keep", "capture.shadowDom": "save", "capture.removeHidden": "none", "capture.linkUnsavedUri": false, "capture.downLink.file.mode": "none", "capture.downLink.file.extFilter": "###image\n#bmp, gif, ico, jpg, jpeg, jpe, jp2, png, tif, tiff, svg\n###audio\n#aac, ape, flac, mid, midi, mp3, ogg, oga, ra, ram, rm, rmx, wav, wma\n###video\n#avc, avi, flv, mkv, mov, mpg, mpeg, mp4, wmv\n###archive\n#zip, rar, jar, bz2, gz, tar, rpm, 7z, 7zip, xz, jar, xpi, lzh, lha, lzma\n#/z[0-9]{2}|r[0-9]{2}/\n###document\n#pdf, doc, docx, xls, xlsx, ppt, pptx, odt, ods, odp, odg, odf, rtf, txt, csv\n###executable\n#exe, msi, dmg, bin, xpi, iso\n###any non-web-page\n#/(?!$|html?|xht(ml)?|php|py|pl|aspx?|cgi|jsp)(.*)/i", "capture.downLink.doc.depth": null, "capture.downLink.doc.delay": null, "capture.downLink.doc.mode": "source", "capture.downLink.doc.urlFilter": "", "capture.downLink.urlFilter": "###skip common logout URL\n/[/=]logout\b/i", "capture.downLink.urlExtra": "", "capture.referrerPolicy": "", "capture.referrerSpoofSource": false, "capture.recordDocumentMeta": true, "capture.recordRewrites": false, "capture.prettyPrint": false, "capture.insertInfoBar": false, "capture.helpersEnabled": false, "capture.helpers": "", "capture.remoteTabDelay": null, "capture.deleteErasedOnCapture": true, "capture.deleteErasedOnSave": false, "capture.backupForRecapture": true, "capture.zipCompressLevel": null

} } and capture dialog Capturing (document) [1] https://dotnet.microsoft.com/en-us/download/dotnet-framework/net481?cid=getdotnetframework ... Saving data... Saved to "/home/user/Downloads/WebScrapBook/data/20240309025439516/index.html" Done.

danny0838 commented 8 months ago

You need to set Save captured data to: to Backend server. See doc for more details.

seregaMetbovb commented 8 months ago

Thanks a lot. i did not notice this feature. Now WebScrapBook saves everything properly.