adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.24k stars 480 forks source link

Skipping entries with InvalidFormatException #656

Open pgodwin opened 2 years ago

pgodwin commented 2 years ago

Firstly, thanks for such an terrific library!

Quick question - is it possible to skip entries with an InvalidFormatException?

I have some rar files (comicbook CBR files), that seem to have been corrupted. WinRar will skip over those files, but SharpCompress throws an SharpCompress.Common.InvalidFormatException: 'Unknown Rar Header: 205' exception.

It seems to happen on the MoveNext() of the Entries enumerator.

using (var archive = RarArchive.Open(fileStream))
{
    //foreach (var entry in archive.Entries)
    var entries = archive.Entries.GetEnumerator();
    while (entries.MoveNext()) // <-- SharpCompress.Common.InvalidFormatException: 'Unknown Rar Header: 205'
...
}
image
adamhathcock commented 2 years ago

SharpComrpess is more raw than WinRAR. You'll have to manually try/catch errors like that.

If there's a better way to handle corrupt entries and skip them, I'm open to changing the API.

pgodwin commented 2 years ago

Strangely, if I call to RarArchive.Entries.Count, before looping through the entries, the exception isn't raised.

adamhathcock commented 2 years ago

The entries are built only using the header that has file info then skips to the next header without reading everything.

It's likely that a file is stored/compressed incorrectly but the recorded sizes are fine.

pgodwin commented 2 years ago

I think thats it. I tried creating some purposefully corrupted RAR files to try and reproduce the issue, but couldn't reproduce the issue (only ended up creating invalid vints or changing the number of bytes in the file).

I'm happy to close this issue as it's clearly by design and isn't a bug in the code.

At this point, I don't have any API suggestions to improve things - it'd be good to know if an entry is invalid or can't be read, and be able to skip to the next item - but how you would do so cleanly I'm not too sure (perhaps overload IEnumerator.MoveNext() with an option to ignore bad entries / add an State/Exception field to the collection).

adamhathcock commented 2 years ago

Yeah, the problem is that you can only "detect" it by trying to decompress the file. Putting a try/catch in the correct place is the trick but doing it in a clean way is the hard part.