adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.25k stars 479 forks source link

Recent changes break opening tars containing rars #117

Closed benshoof closed 8 years ago

benshoof commented 8 years ago

Recent changes cause SharpCompress to incorrectly open a tar containing a rar as if it were a rar, making it unable to process the archive. This affects both Archive and Reader APIs.

The order of archive-type tests was recently changed in ArchiveFactory and ReaderFactory Open() methods and rar is now tested before tar. The problem is that RarArchive.IsRarFile() incorrectly returns true for a tar containing a rar. I found this would happen with both Archive and Reader APIs when the rar file was the first entry but only Archive when the rar appeared later in the file.

I've attached a sample tar containing a rar containing your test.jpg. For maximum confusion, since github won't accept tar attachments, I've placed this inside a zip.

I noticed that ArchiveFactory.Open() passes Options.LookForHeader and Options.KeepStreamsOpen when calling IsRarFile() instead of the options parameter like all the other tests. I don't know if that's related or intentional.

test.zip

adamhathcock commented 8 years ago

It's probably the fact that LookForHeader makes it search for the header when expecting an EXE. This shouldn't be used for ArchiveFactory.Open if that is the case. If you remove that, does that fix it?

I've been busy (as usual) having to work a weekend and also ill. I'm so lax on this project.

benshoof commented 8 years ago

There are two bugs here and I got hit with both.

  1. ArchiveFactory.Open() is always passing LookForHeader, causing any tar with a rar within the the first 0x80000 bytes to pass the rar test and be opened as such. This affects the Archive API and is easy to fix.
  2. The rar signature (marker block) of a rar file isn't being validated when LookForHeader is omitted, causing many non-rar files to appear to be rars. This affects the Reader API right now will affect the Archive API once the other bug is fixed.

The only validation of the rar signature is done in RarFactory.CheckSFX() which only happens when LookForHeader is specified. Without LookForHeader, the parser plows ahead and casts the third byte of the file to a HeaderType and if that type can be parsed without throwing an exception then the file is assumed to be a rar file. These are easy conditions for a random file to happen to pass. The sample tar file I attached happens to pass along with the real world tar that first tripped the ArchiveFactory bug. If the third byte of the file is between 'r' and '{', such as the third character of the first filename in a tar, then as long as the rest of the bytes don't create too large offsets the test will pass.

I notice that MarkHeader has an unreferenced IsValid() and commented-out IsSignature(). It seems like ReadHeaders() should be validating that the first header is a MarkHeader and has the expected values.