adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.25k stars 479 forks source link

Random access an encrypted multipart rar volume #77

Closed Dreamcooled closed 9 years ago

Dreamcooled commented 9 years ago

Hei @adamhathcock Thanks for sharing this project with us. I'm trying to achieve some complicated stuff with rar and for that I have looked at number of unrar implementations (including the unrar code itself). Your implementation is by far the most advanced and best structured one :+1: .

What I'm trying to achieve:

 var streams = new List<LazyStream>();
 ....
 RarArchive a = RarArchive.Open(streams, Options.None);
a.Entries.First().OpenEntryStream().Read(...)

Even if we leave out the encryption for a second, we have multiple problems:

My Questions:

Background: Movies, Tv Shows hosted on one-click-hosters are often delivered in 100mb parts. I already have a piece of software which allows you to automatically unpack parts (and start watching the content) while the missing parts are still loading. But if you take this idea to the next level and combine a downloader, extractor and a player into one application, you would be able to provide an even greater experience (seeking !). Seeking in a mkv file is not a problem. Seeking in AES Stream is a bit harder (but should still be possible in theory, right?). But seeking in a multipart rar volume.....

adamhathcock commented 9 years ago

EntryStreams just aren't seekable. The decompression process is one that has to happen in order, just like decryption. You simply can't just jump the process. This might not be true for all compression algorithms as it's not for all encryption algorithms. For example, ECB mode encryption just does it in unrelated blocks. However, this is much less secure and not used. Plus, you'd likely have to rewrite CryptoStream to even know how to deal with skipping blocks.

The product I work on actually compresses it's files in 4k blocks to allow for random seeking on the file. However, this is a custom file format and no well-known archives do this.

In code, I buffer things in memory to allow some kind of seeking of data I've already seen but you obviously cannot seek forward, What you'd likely have to do is almost treat the process like a streamed video on netflix. You buffer the data on the client to some degree and don't allow seeking forward of data that hasn't been buffered on the client. You can't jump forward without downloading the whole file unfortunately.

adamhathcock commented 9 years ago

Btw, thanks for the words about the project :)

Dreamcooled commented 9 years ago

Thanks for your answer.

AES CBC allows random access decryption (according to wikipedia) and as can be seen in this diagramm. Isn't that what rar is using? But ya, I would have to rewrite the crypto stream.

But let's go back to the EntryStream discussion: I understand that you cannot seek to any random position, because the decryption process is kind of like a state-machine - and you wouldn't know in what kind of state or type of block you find yourself if you seek to a random position. Question 1: Are there any fixed positions? Could it be possible to jump to the start of the next volume, and start extracting there? Or are all volumes concatenated and I really have no other chance than to start from the very beginning and process it till the very end?

If we forget all that random access on EntryStreams for a moment: Question 2: How can I extract a (encrypted or non encrypted) multipart archive from start to end using the RarArchive.Open method (as described in the initial question)? I get a SharpCompress.Common.IncompleteArchiveException when I try to do that...

adamhathcock commented 9 years ago

CBC allows for forward and possibly backward movement. However, you cannot just jump to any point in the stream as you need to have the previous block to encrypt/decrypt the current block.

The same basically holds true for just about any compression I'm aware of and using. Rar Volumes compress a file THEN split it. So a file that is 10 mb, compressed into 6 mb, then split into three volumes of 2 mb cannot just start at the second volume without decompressing the first. Compression is a continuous stream on a file typically. You could compress blocks of a file at time then manage the blocks (as I mentioned earlier) then you can jump to blocks and decompress them on the fly giving the illusion of random seeking but that's not how any archive format works that's in sharpcompress.

RarArchive.Open(string/FileInfo) always expects the first file, which is probably why you get that. There is RarArchive.Open(IEnumerable) which is portable and only acts on streams. However, the multi-volume archive must have it's parts delivered in order. Rar only knows about the files in a multivolume archive by filename.

Dreamcooled commented 9 years ago

Thanks a lot for your time and answers. I really appreciate it. Now I'm ready to think a bit more realistic about the goal's of my project :).