DrMcCoy / dmc_unrar

A dependency-free, single-file FLOSS unrar library
GNU General Public License v2.0
57 stars 10 forks source link

Proposal: alternative decompression interface #5

Open DrMcCoy opened 5 years ago

DrMcCoy commented 5 years ago

A proposal for an alternative decompression interface that doesn't collect all files on opening, before any decompression is taking place. This would be useful if a linear decompression of the whole file is intended, especially if done from a medium where skipping through the whole file just to find all the files is too expensive.

@fasterthanlime, does that sound reasonable and like something you could use? Or am I going off in a totally wrong direction?

fasterthanlime commented 5 years ago

I was hesitant to open a similar issue just a few minutes ago!

What you outlined fits our usecase exactly, with one important omission: extraction should be pausable/resumable, so... it's important that the data structures be serializable.

Here's how resumable decompressors operate in butler:

Some example of checkpoint structs (in Golang, sorry..), from simplest to hairiest:

These are usually saved as part of a larger structure: for .tar.gz and .tar.bz2, for example, there's tarextractor:

https://github.com/itchio/savior/blob/fa53ef6e95620d2f5583580af460386f1fcb7190/tarextractor/tarextractor.go#L22-L25

(*savior.ExtractResult gets filled progressively with entries containing path, size, permissions, etc. - this is what I need to keep track of).

In dmc_unrar's case, looking at the complexity of actual file decompression, I fear it might be unreasonable to shoot for ResumeSupportBlock, see https://github.com/itchio/savior/blob/a9f8c3af201ef807ef6107294cb36bc3893bb02e/extractor.go#L85-L95 - but ResumeSupportEntry might be easy to achieve with the interface you suggested.

There might not even be a need to save that much internal dmc_unrar data. It needs to be as little info as possible, that lets dmc_unrar resume from a given entry, so, from my understanding, it would be:

So, for example, const dmc_unrar_file *dmc_unrar_next_file(dmc_unrar_archive *archive, const dmc_unrar_file *file) works great if you do streaming/linear decompression in one execution - but if you have to stop/resume from a disk checkpoint, then.. you have no dmc_unrar_file *file to pass.

Do you see what I'm getting at? Hopefully this is not too much!

DrMcCoy commented 5 years ago

Hmm, yes, makes sense.

And yes, continuing from the middle of a file would be really complex. Restarting from the beginning of a file within the archive sounds feasible, though.

It should be possible to extract a few key integer values (offset, mostly, yeah) from the internal state given a dmc_unrar_archive *archive, const dmc_unrar_file *file pair, which you can then squirrel away. Another function would then reconstruct some of the internal state from these values and a newly opened linear archive and spit out the last const dmc_unrar_file *file.

Does that sound good?

fasterthanlime commented 5 years ago

Does that sound good?

Yep, that sounds reasonable!