gen2brain / go-unarr

Go bindings for unarr (decompression library for RAR, TAR, ZIP and 7z archives)
zlib License
282 stars 44 forks source link

I found bundled C source code for 7z, rar, tar and zip, bzip2, is a concern, maybe out-of-date. #42

Closed rwperrott closed 2 weeks ago

rwperrott commented 3 months ago

I looked at this because of surprising unarchiving failures by cbconvert.

I could see "C" references in the Go code, so assume that all of this C code is compiled for use by the Go code.

e.g. The unarrc/external/zlib/README starts with:

ZLIB DATA COMPRESSION LIBRARY

zlib 1.2.11 is a general purpose data compression library.  All the code is
thread safe.  The data format used by the zlib library is described by RFCs

The current zlib version from https://www.zlib.net/ is zlib 1.3.1, January 22, 2024.

It would probably have been smarter to dynamic bind to a 7z library for all the archive types, maybe a maintained Go library already exists for this, or the code could call CLI 7z/7zz, or archive specific CLI programmes.

This smells off.

rwperrott commented 3 months ago

I had a look for 7z stuff on https://pkg.go.dev/, and saw:

unarr (github.com/gen2brain/go-unarr)

Package unarr is a decompression library for RAR, TAR, ZIP and 7z archives.
Imported by 36
| v0.2.3 published on Apr 23, 2024 | Zlib

So correction, it is maintained, however the bundled C source code still seems off.

gen2brain commented 3 months ago

C sources are updated when there is a need, and I don't see one. The usual distros probably have even older versions.

rwperrott commented 3 months ago

True, I noticed that with other people's, yours being the most recent upload! I decided to write my own tool to process huge CB* and strip out junk, as a bash script, using 7zz (oddly rename more recent 7z) rather than the very dated 7z version used by various Linux tools, concurrent executions of mogrify of ImageMagic 7, a direct build/install, because not available on default repo, and used find for recursive search/execution. 7z can automatically handle a lot of different archive stuff itself, like ZIP64, and is very fast. I was surprised that WEBP files can actually be larger than JPEG ones! AVIF is a no-starter currently because not supported by much yet.

gen2brain commented 3 months ago

The plan is to use https://github.com/gen2brain/jpegli for JPEG, resulting in much smaller files. I don't see the point in using the 7z or RAR, there is very little to gain as images are already compressed, and e.g. proprietary formats like RAR just create troubles. The gain you get in using e.g. jpegli, resizing to the appropriate size (i.e. instead of 6000x5000), converting black&white to 4bit uncompressed BMP, and then compressing, etc. IMO only .cbz should exist, or even better .cbt, but now it is too late. Anyway, all these formats are unofficial and are created by accident.

jwillikers commented 2 weeks ago

I'm working on a package for cbconvert for Nix, and it seems like it would be helpful if it was possible to opt-in to using external libraries somehow.

gen2brain commented 2 weeks ago

@jwillikers I could add extlib tag, like I usually do, to use the external libunarr library. I didn't because at the time it was not packaged for any distro, not sure about the status now. What I don't want to mess with is that some parts of code are included and some not, that would get complicated. I want to see if I can use libarchive with purego, so if that can work I will probably migrate cbconvert to libarchive (because of RAR 5).

jwillikers commented 2 weeks ago

@jwillikers I could add extlib tag, like I usually do, to use the external libunarr library. I didn't because at the time it was not packaged for any distro, not sure about the status now. What I don't want to mess with is that some parts of code are included and some not, that would get complicated. I want to see if I can use libarchive with purego, so if that can work I will probably migrate cbconvert to libarchive (because of RAR 5).

libunarr is packaged in Nix currently, so that wouldn't be a problem for this use case. I think all that would be needed is an extlib tag. Using libarchive with purego seems like it would also solve the issue, since I'd just have to specify a dependency on good 'ol libarchive for the package.

gen2brain commented 2 weeks ago

@jwillikers extlib option is added, and cbconvert now uses the latest go-unarr.

jwillikers commented 2 weeks ago

@jwillikers extlib option is added, and cbconvert now uses the latest go-unarr.

@gen2brain Thanks! That totally works now and I'm able to build everything!