Open DrMcCoy opened 5 years ago
I was hesitant to open a similar issue just a few minutes ago!
What you outlined fits our usecase exactly, with one important omission: extraction should be pausable/resumable, so... it's important that the data structures be serializable.
Here's how resumable decompressors operate in butler:
Some example of checkpoint structs (in Golang, sorry..), from simplest to hairiest:
These are usually saved as part of a larger structure: for .tar.gz
and .tar.bz2
, for example, there's tarextractor:
(*savior.ExtractResult
gets filled progressively with entries containing path, size, permissions, etc. - this is what I need to keep track of).
In dmc_unrar's case, looking at the complexity of actual file decompression, I fear it might be unreasonable to shoot for ResumeSupportBlock
, see https://github.com/itchio/savior/blob/a9f8c3af201ef807ef6107294cb36bc3893bb02e/extractor.go#L85-L95 - but ResumeSupportEntry
might be easy to achieve with the interface you suggested.
There might not even be a need to save that much internal dmc_unrar data. It needs to be as little info as possible, that lets dmc_unrar resume from a given entry, so, from my understanding, it would be:
So, for example, const dmc_unrar_file *dmc_unrar_next_file(dmc_unrar_archive *archive, const dmc_unrar_file *file)
works great if you do streaming/linear decompression in one execution - but if you have to stop/resume from a disk checkpoint, then.. you have no dmc_unrar_file *file
to pass.
Do you see what I'm getting at? Hopefully this is not too much!
Hmm, yes, makes sense.
And yes, continuing from the middle of a file would be really complex. Restarting from the beginning of a file within the archive sounds feasible, though.
It should be possible to extract a few key integer values (offset, mostly, yeah) from the internal state given a dmc_unrar_archive *archive, const dmc_unrar_file *file
pair, which you can then squirrel away. Another function would then reconstruct some of the internal state from these values and a newly opened linear archive and spit out the last const dmc_unrar_file *file
.
Does that sound good?
Does that sound good?
Yep, that sounds reasonable!
A proposal for an alternative decompression interface that doesn't collect all files on opening, before any decompression is taking place. This would be useful if a linear decompression of the whole file is intended, especially if done from a medium where skipping through the whole file just to find all the files is too expensive.
dmc_unrar_archive_open_*_linear()
functions to mirror the usualdmc_unrar_archive_open_*()
functions, that don't fill in the file structures in the archive (and therefore,dmc_unrar_get_file_count()
will return 0 for these).const dmc_unrar_file *dmc_unrar_next_file(dmc_unrar_archive *archive, const dmc_unrar_file *file)
function that reads in one additional file block and returns the usual stats structure. If given NULL as the file parameter, it reads the first one. If NULL is returned, no further files are in the archive.dmc_unrar_get_filename_file()
,dmc_unrar_file_is_directory_file()
,dmc_unrar_extract_file_to_*_file()
functions that take aconst dmc_unrar_file *
instead of an index and do the usual.dmc_unrar_next_file()
will grow the internal structures within the dmc_archive, the functions taking indices should still work on the files found so far. And also, a call todmc_unrar_next_file()
will invalidate previousconst dmc_unrar_file *
)struct dmc_unrar_file
will be expanded to contain some internal pointer to the archive, and also more user-readable fields like the index, offset within the file (for an estimation on the archive extraction progress).@fasterthanlime, does that sound reasonable and like something you could use? Or am I going off in a totally wrong direction?