Closed iiv3 closed 6 months ago
To prevent a potential conflict among different filesystems, all files will be saved as all-lower-case by WSB. Just keep this in mind and search the all-lower-case version filename (or search by URL in the index.json and then the corresponding saved filename).
There is no lower-case version of the file saved. That's why the merge result is broken.
And please, preserve the original case of the file. It's the correct way to handle case-sensitive filesystem. Otherwise you are going to open much bigger can of worms. Aka, URL are case sensitive.
There is no lower-case version of the file saved. That's why the merge result is broken.
This should not happen. If it does please provide a real case example.
And please, preserve the original case of the file. It's the correct way to handle case-sensitive filesystem. Otherwise you are going to open much bigger can of worms. Aka, URL are case sensitive.
This is for cross-platform compatibility. First it's not possible for the browser to detect whether the filesystem is case sensitive or not. Additionally there will be a problem when files are moved to a case-insensitive filesystem if both "image.jpg" and "IMAGE.JPG" exist.
How should I provide real case example? The one I've provided is from capture of the profile page of the current Twitter's owner.
You don't need to change the filename if there is no conflict. Different "image.jpg" could exist in multiple captured pages. You do handle that type of conflict, don't you?
Apparently, the current code already has both case-sensitive and lower-case filenames in different lists.
I wouldn't advice changing case of non-ascii filenames.
I cannot get the exact issue from your description.
Please provide a reproducible case and the exact steps to reproduce the issue, such as the source URL, the steps you run the capture (and the capture options), and how you perform the merge capture, and what's wrong in the result, etc.
I see why you can't reproduce it.
I use "OldTwitter" extension that always loads the same original size image, so images are the same. If you follow my instructions you get multiple version with different sizes, and they all get their own new files.
Let me find a simpler site.
Ok, the linux graphics server site is simple enough.
First capture, the home page.
{
"tasks": [
{
"comment": "",
"fullPage": true,
"tabId": 526183075,
"title": "X.Org",
"url": "https://www.x.org/wiki/"
}
],
"bookId": "Temp",
"parentId": "root",
"index": null,
"mode": "",
"delay": null,
"options": {
"capture.applet": "blank",
"capture.audio": "save-current",
"capture.backupForRecapture": true,
"capture.base": "blank",
"capture.canvas": "save",
"capture.contentSecurityPolicy": "remove",
"capture.deleteErasedOnCapture": true,
"capture.deleteErasedOnSave": true,
"capture.downLink.doc.delay": null,
"capture.downLink.doc.depth": 0,
"capture.downLink.doc.mode": "source",
"capture.downLink.doc.urlFilter": "",
"capture.downLink.file.extFilter": "",
"capture.downLink.file.mode": "none",
"capture.downLink.urlExtra": "",
"capture.downLink.urlFilter": "",
"capture.downloadRetryCount": 3,
"capture.downloadRetryDelay": 1000,
"capture.embed": "blank",
"capture.favicon": "save",
"capture.faviconAttrs": "",
"capture.font": "link",
"capture.formStatus": "keep",
"capture.frame": "save",
"capture.frameRename": true,
"capture.helpers": "",
"capture.helpersEnabled": false,
"capture.image": "save-current",
"capture.imageBackground": "save-used",
"capture.insertInfoBar": false,
"capture.linkUnsavedUri": true,
"capture.mergeCssResources": true,
"capture.noscript": "save",
"capture.object": "blank",
"capture.ping": "blank",
"capture.prefetch": "remove",
"capture.preload": "remove",
"capture.prettyPrint": false,
"capture.recordDocumentMeta": true,
"capture.recordRewrites": false,
"capture.referrerPolicy": "strict-origin-when-cross-origin",
"capture.referrerSpoofSource": false,
"capture.remoteTabDelay": 300,
"capture.removeHidden": "undisplayed",
"capture.resourceSizeLimit": null,
"capture.rewriteCss": "url",
"capture.saveAs": "folder",
"capture.saveAsciiFilename": false,
"capture.saveDataUriAsFile": true,
"capture.saveDataUriAsSrcdoc": true,
"capture.saveFileAsHtml": false,
"capture.saveFilename": "%create-Y%.%create-m%/%id%_%source-host%",
"capture.saveFilenameMaxLenUtf16": 120,
"capture.saveFilenameMaxLenUtf8": 240,
"capture.saveFolder": "WebScrapBook/data",
"capture.saveOverwrite": false,
"capture.saveResourcesSequentially": false,
"capture.saveTo": "server",
"capture.script": "remove",
"capture.serverUploadRetryCount": 3,
"capture.serverUploadRetryDelay": 2000,
"capture.serverUploadWorkers": 4,
"capture.shadowDom": "save",
"capture.style": "save",
"capture.styleInline": "save",
"capture.video": "save-current",
"capture.zipCompressLevel": null
}
}
The merge capture on the second link in the first paragraph "The X.Org Foundation" that leads to an "about" page
{
"tasks": [
{
"fullPage": true,
"mergeCaptureInfo": {
"bookId": "Temp",
"itemId": "20230813170453252"
},
"tabId": 526183075,
"url": "https://www.x.org/wiki/XorgFoundation/"
}
],
"bookId": "Temp",
"parentId": "20230813170453252",
"index": null,
"mode": "",
"delay": null,
"options": {
"capture.applet": "blank",
"capture.audio": "save-current",
"capture.backupForRecapture": true,
"capture.base": "blank",
"capture.canvas": "save",
"capture.contentSecurityPolicy": "remove",
"capture.deleteErasedOnCapture": true,
"capture.deleteErasedOnSave": true,
"capture.downLink.doc.delay": null,
"capture.downLink.doc.depth": 0,
"capture.downLink.doc.mode": "source",
"capture.downLink.doc.urlFilter": "",
"capture.downLink.file.extFilter": "",
"capture.downLink.file.mode": "none",
"capture.downLink.urlExtra": "",
"capture.downLink.urlFilter": "",
"capture.downloadRetryCount": 3,
"capture.downloadRetryDelay": 1000,
"capture.embed": "blank",
"capture.favicon": "save",
"capture.faviconAttrs": "",
"capture.font": "link",
"capture.formStatus": "keep",
"capture.frame": "save",
"capture.frameRename": true,
"capture.helpers": "",
"capture.helpersEnabled": false,
"capture.image": "save-current",
"capture.imageBackground": "save-used",
"capture.insertInfoBar": false,
"capture.linkUnsavedUri": true,
"capture.mergeCssResources": true,
"capture.noscript": "save",
"capture.object": "blank",
"capture.ping": "blank",
"capture.prefetch": "remove",
"capture.preload": "remove",
"capture.prettyPrint": false,
"capture.recordDocumentMeta": true,
"capture.recordRewrites": false,
"capture.referrerPolicy": "strict-origin-when-cross-origin",
"capture.referrerSpoofSource": false,
"capture.remoteTabDelay": 300,
"capture.removeHidden": "undisplayed",
"capture.resourceSizeLimit": null,
"capture.rewriteCss": "url",
"capture.saveAs": "folder",
"capture.saveAsciiFilename": false,
"capture.saveDataUriAsFile": true,
"capture.saveDataUriAsSrcdoc": true,
"capture.saveFileAsHtml": false,
"capture.saveFilename": "%create-Y%.%create-m%/%id%_%source-host%",
"capture.saveFilenameMaxLenUtf16": 120,
"capture.saveFilenameMaxLenUtf8": 240,
"capture.saveFolder": "WebScrapBook/data",
"capture.saveOverwrite": false,
"capture.saveResourcesSequentially": false,
"capture.saveTo": "server",
"capture.script": "remove",
"capture.serverUploadRetryCount": 3,
"capture.serverUploadRetryDelay": 2000,
"capture.serverUploadWorkers": 4,
"capture.shadowDom": "save",
"capture.style": "save",
"capture.styleInline": "save",
"capture.video": "save-current",
"capture.zipCompressLevel": null
}
}
When you go to the merged page in the archive, the "donate" buttons have lost their images. There are small square image placeholder and a text.
I have to remind you, that if you run the WSB server on OS that is not case-sensitive, it will manage to find the files with the changed name.
OK. I get it. We may need further investigation for a solution, though.
v2.1.0 should have fixed the issue.
To start a merge capture you do a "normal capture" with "Depth to capture linked pages:" set to 0 or more. This creates index.json file in the capture folder that holds association between already stored files and their URL.
The problem is that while the files are stored with their original case on the filesystem, the index.json file lists the path to the file in lowercase.
On a second capture/merge, that path is been used to replace already captured resources. As result the resource cannot be found if it's original URL name contained an upper-case symbol and the OS file search is case sensitive.
The capture option "Save ASCII filename" doesn't seem to have any effect in this case. Same for "Save data URL as file"
I'm running WebScrapBook 2.0.4 extension on Chromium 114 (Linux).