adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.27k stars 480 forks source link

System.IO.EndOfStreamException: Unable to read beyond the end of the stream. #463

Open ThunderBoltEngineer opened 5 years ago

ThunderBoltEngineer commented 5 years ago

I tried to extract contents from zip file using this library but got this error-

System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.FillBuffer(Int32 numBytes)
   at System.IO.BinaryReader.ReadUInt32()
   at SharpCompress.Common.Zip.StreamingZipHeaderFactory.ReadStreamHeader(Stream stream)+MoveNext()
   at SharpCompress.Readers.Zip.ZipReader.GetEntries(Stream stream)+MoveNext()
   at SharpCompress.Readers.AbstractReader`2.MoveToNextEntry()
...

This library seems to work well for most of the zip files but It is getting the issue with a particular zip file. Please let me know how to handle this.

adamhathcock commented 5 years ago

Sounds like a corrupt zip file that is incomplete. Try opening it with an app to see.

ThunderBoltEngineer commented 5 years ago

@adamhathcock Thanks for your reply. But I can open and extract the zip file using winrar. I tried to zip the extracted contents and then it worked. So I guess something is wrong with the zip file but not sure exactly what it is. My question is - why the library fails to parse the zip file while the winrar app does?

adamhathcock commented 5 years ago

Depends on how you access it. ZipArchive uses the end directory while ZipReader uses headers inline with entries. Have you tried both access mechanisms?

Either, somehow the file is being fed to Sharpcompress wrong or an entry has a bad file offset to read from. Offsets exist in the directory header and local file header.

It could be a bug but this code is fairly well used at this point. I’m leaning to bad file but not sure.

ThunderBoltEngineer commented 5 years ago

I am using ReaderFactory class to open the file. And then MoveToNextEntry() function to iterate over the zipped contents. Do you want me to upload the problem file?

adamhathcock commented 5 years ago

Try using ArchiveFactory as a test

ThunderBoltEngineer commented 5 years ago

@adamhathcock Could you let me know what are their differences? ArchiveFactory class can be used to extract files?

ThunderBoltEngineer commented 5 years ago

@a764578566 ArchiveFactory worked. I have updated the routine to use it instead of ReaderFactory.

adamhathcock commented 5 years ago

Archive is random access. Reader is forward only. I try to explain it in the read me.

Zip has a dictionary of file headers at the end of the file that is used in ArchiveFactory. It’s not used in ReaderFactory

ThunderBoltEngineer commented 5 years ago

I see, so can I use ArchiveFactory class for unpacking any kind of archive files? (that are supported by this library)

pmnforce commented 3 years ago

I'm experiencing the same issue. For me it seems to be related to zip files that are streamed i.e. files with an unknown length when they're created but content is streamed to the file ad hoc, such as when you download multiple files from Microsoft OneDrive or Dropbox and the zip file doesn't have a final length when the download starts.

The following code will cause the error to occur:

var archive = ArchiveFactory.Open(fileStream);  
var reader = archive.ExtractAllEntries();
while (reader.MoveToNextEntry())
{
    reader.WriteEntryToFile(Path.GetTempFileName());
}

The following code will work:

var archive = ArchiveFactory.Open(fileStream);  
foreach (var entry in archive.Entries)
{
    using var fileStream  = File.OpenWrite(Path.GetTempFileName());
    using var zipStream = entry.OpenEntryStream();
    zipStream.CopyTo(fileStream);
}

But it is my understanding that the first piece of code is best practice for this library.

adamhathcock commented 3 years ago

There really isn't a "best practice" because formats can be very different. Zip files created in a streaming manner can be very different. ExtractAllEntries may not be the best way for your use case. There's nothing wrong with the way you you're using the second sample

pmnforce commented 3 years ago

Thanks for the reply. I realised the reason I used the reader was because of the way SevenZip is handled. So I ended up with code that look as follows, which works beautifully. Thanks for an awesome API btw! :)

if (archive.Type == ArchiveType.SevenZip)
{
    using var reader = archive.ExtractAllEntries();

    while (reader.MoveToNextEntry())
    {
        // Extract entry
    }
}
else
{
    foreach (var entry in archive.Entries)
    {
        // Extract entry
    }
}