ccloli / E-Hentai-Downloader

Download E-Hentai archive as zip file
GNU General Public License v3.0
1.82k stars 137 forks source link

压缩包损坏问题 #216

Closed 424778940z closed 2 years ago

424778940z commented 2 years ago

如题 以前遇到过几次重新下就好了 但/g/1951010/d9a48bec3c/似乎怎么下都会损坏 我先后下载了四次 都是损坏 就以往损坏时候研究了一下压缩包文件 数据其实在的 只是尾部好像被掐掉了一块

010里看一个标准zip应该有 struct ZIPFILERECORD record struct ZIPDIRENTRY dirEntry struct ZIPENDLOCATOR endLocator

但这个脚本生成的只有(包括好的) ZIPFILERECORD record

貌似这个zip不是很标准? 但问题似乎不是缺少了这两个东西? 没有深入研究, 工作时间不太好开ex摸鱼2333

ccloli commented 2 years ago

后续看一下,可以先提供下运行环境,例如使用的浏览器、使用的用户脚本扩展以及脚本的设置。

简单的话可以打开任意一个 gallery,按下 F12 切换至 Console,不需要开始下载,将已经输出的日志复制粘贴过来就行。

初步怀疑可能是用了 file system 但是写入块丢失了?或者是系统盘剩余空间不足导致 blob storage 满了?可能需要看看具体的脚本配置才能排查

424778940z commented 2 years ago

chrome+tampermonkey 脚本设置我记得只改过一下输出文件名格式 我所有东西都在c盘 还有150G空余目前

日志

userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:39 [EHD] E-Hentai Downloader is running.
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:40 [EHD] Bugs Report > https://github.com/ccloli/E-Hentai-Downloader/issues | https://greasyfork.org/scripts/10379-e-hentai-downloader/feedback
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:41 [EHD] To report a bug, it's recommended to provide the logs started with "[EHD]", thanks. =w=
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12323 [EHD] UserAgent > Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12324 [EHD] Script Handler > Tampermonkey
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12325 [EHD] Script Handler Version > 4.16.1
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12326 [EHD] E-Hentai Downloader Version > 1.34.2
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12327 [EHD] Current URL > https://exhentai.org/g/1951010/d9a48bec3c/
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12328 [EHD] Is Logged In > true
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12287 [EHD] E-Hentai Downloader Setting > {"thread-count":10,"speed-detect":true,"number-images":true,"number-separator":"_","number-real-index":false,"number-auto-retry":true,"auto-download-cancel":true,"file-name":"{subtitle}","recheck-file-name":false,"ignore-torrent":true,"status-in-title":"blur","hide-image-limits":false,"hide-estimated-cost":false,"file-descriptor":true,"force-resized":false,"never-new-url":false,"never-send-nl":false,"never-warn-large-gallery":false,"store-in-fs":true,"play-silent-music":true,"save-info":"file","save-info-list":["title","metas","uploader-comment","page-links"],"save-info-list[]":false,"replace-with-full-width":false,"force-pause":false,"save-as-cbz":false,"pass-cookies":false,"force-as-login":false}
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:14651 [EHD] Request Resolution Setting
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:14515 [EHD] Request Image Limits From e-hentai.org
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12123 [EHD] File System is opened! Name > https_exhentai.org_0:Temporary
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:12101 [EHD] File 1951010.zip is removed.
DevTools failed to load source map: Could not load content for chrome-extension://lcghoajegeldpfkfaejegfobkapnemjl/lib/browser-polyfill.js.map: System error: net::ERR_BLOCKED_BY_CLIENT
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:14662 [EHD] Resolution Setting > {"withoutHentaiAtHome":0,"resolution":0,"timestamp":1656335004496}
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:14515 [EHD] Request Image Limits From e-hentai.org
userscript.html?name=E-Hentai%2520Downloader.user.js&id=067cdfb4-4005-4c2e-8d0f-6b4b9c4709bc:14515 [EHD] Request Image Limits From e-hentai.org
ccloli commented 2 years ago

但/g/1951010/d9a48bec3c/似乎怎么下都会损坏 我先后下载了四次 都是损坏

Estimated Limits Cost: 31 + 6941 GP

下班太晚,gallery 太老,现在下载要 GP,下次再试好了 🤣

"file-descriptor":true

这个配置项有点可疑,一会我用其他 gallery 试试看

ccloli commented 2 years ago

试着下载了几个 gallery,无论是否开启 file-descriptor,文件都是正常的(issue 里的这个没测试,GP 消耗有点多)。

而且按照我之前写的解析 zip 文件的笔记和代码,这些文件是符合标准结构的,也就是上面的三种 zip 文件记录的格式都是存在的。

长图杀猫 ![image](https://user-images.githubusercontent.com/8115912/175976665-b0064b40-18e5-46ae-87b6-1a2591bc953b.png)

虽然不知道为什么我之前写的脚本居然没法解压文件,估计文件格式比较怪或者我写的代码有问题 🤔 (虽然确实有问题,至少 zip 文件的时间解析错了)

image

不过考虑到 WinRAR 和 7-Zip 都能正常解压,那就可以暂时认为,至少这个格式能被主流解压软件识别,并且能正常解压。

"store-in-fs":true

issue 提到的部分格式丢失,怀疑是某些边界情况下,部分文件块没有正常写入 FileSystem 导致的。在进一步测试上面的 gallery 之前,可以尝试同时进行以下设置:

424778940z commented 2 years ago

"Stream files and create Zip with file descriptors" "Use File System to handle large Zip file"

关掉这两个选项之后重新下载得到的zip是正常的 对比俩那个文件之后发现 坏的zip暂且不管尾巴不见了的事情 最开始文件的PK头是有问题的 很多字段都是0

没问题的zip, 可以看到pk头里数据都是有的 good 有问题的zip, pk头很多都是0000 bad

ccloli commented 2 years ago

有问题的zip, pk头很多都是0000

其实这是符合预期的,那个设置项对应 JSZip 的这一个配置项:

streamFiles option In a zip file, the size and the crc32 of the content are placed before the actual content: to write it we must process the whole file. When this option is false (the default) the processed file is held in memory. It takes more memory but generates a zip file which should be read by every program. When this options is true, we stream the file and use data descriptors at the end of the entry. This option uses less memory but some program might not support data descriptors (and won’t accept the generated zip file).

也就是说,填 0 的字段只是作为占位符,对应的数据位于 file record 的尾部以 zip spanning marker 出现,也就是下图中黄色标记的部分:

Screenshot_20220628-100057_Chrome

具体这部分的数据在 PKZIP 的 官方文档 里也有提到

SmartSelect_20220628-101208_Termux

所以如果排除可能的文件损坏的情形,该文件格式是符合预期的

424778940z commented 2 years ago

hummm 那可能还是stream没有被完整存下来导致无法解析? 总之关掉这两个之后就好了 用普通zip好像也没什么问题

ccloli commented 2 years ago

周末试了下这个 gallery,开启 Stream files and create Zip with file descriptors 和 File System,下载的文件使用 WinRAR 测试是没有问题的,也可以正常解压,不过 7-Zip 和某个早期版本的 WinZip 确实没法识别,暂且认为是 streamFiles 这个选项与不少软件存在兼容性问题好了。

image