bug: not tolerating archives with bad/missing meta data, but which are extractable

gen2brain / cbconvert

CBconvert is a Comic Book converter

GNU General Public License v3.0

195 stars 13 forks source link

bug: not tolerating archives with bad/missing meta data, but which are extractable #36

Open rwperrott opened 1 month ago

rwperrott commented 1 month ago

In Linux common desktop archive tools can open these defective archives, e.g. XArchiver. It can be stupid stuff like a missing end-of-file directory in zip (cbz) files, or a missing end of file marker. It's annoying having to handle these file by extracting, then recompressing, so that the can be reprocessed. I suspect that switching to 7z libraries for archives could fix this.

rwperrott commented 1 month ago

I think that this issue is significantly caused by not recognising input filesystem files over 2GB in size, or Zip64. I have seen comic cbz of several volume directories, which exceeded 2GB in size.

rwperrott commented 1 month ago

As a workaround I've been using 7zz and this article: https://www.baeldung.com/linux/batch-convert-image-formats

rwperrott commented 1 month ago

I suspect that the cause of these bugs was int32-only archive-size support in the dependency gen2brain/go-unarr https://pkg.go.dev/github.com/gen2brain/go-unarr, so it can't handled int64 sized archive files, even for non-zip archives! cbconvert still failed when I re-archiving over 2GB.zip/cbz files as .7z and .tar files, and tried to convert the contained png's to a zip of webp's.

gen2brain commented 1 month ago

The relevant issue is this I think https://github.com/selmf/unarr/issues/15. There are no plans to add support for different archive libraries because they don't exist. I was thinking about adding support for libarchive but I would have to create new bindings, probably with purego (i.e. dlopen) and then maybe fall back to unarr. I don't have the time for that currently.