adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.27k stars 481 forks source link

Cannot Extract TarArchive - no entries found! #229

Closed SimonCahill closed 7 years ago

SimonCahill commented 7 years ago

I have a simple re-index functionality, and some older archives (application structure) are stored as .tar.gz archives. The tars are unzipped fine, and they are detected as .tar files within the application, however no entries are found!

I've opened the file in WinRAR, and there most definitely are files within the archive, and they can be extracted with WinRAR.

I'm using VS 2017, C#7 with .Net framework 4.6, Windows 10 and Mono 4.6

winrar_entries lib_entries

adamhathcock commented 7 years ago

I'm not going to be able to solve this without a sample file.

I recommend trying to debug this yourself as Tar is a simple format. Maybe at least to give a hint about what's going wrong.

SimonCahill commented 7 years ago

Sadly I can't give you the files in question, otherwise I would have (privacy and confidentiality). I'm currently attempting to debug the solution myself. This issue was more a post to mention an issue with the library, and possibly add a solution for others.

Currently, I'm determining whether an issue occurred with my GZ-uncompressor (unlikely, as it works fine with other file types, but then again, unlikely doesn't mean impossible).

adamhathcock commented 7 years ago

Why are you using a separate GZ decompressor rather than SharpCompress's? I'm asking as I'm always looking to improve implementations.

SimonCahill commented 7 years ago

The method I implemented has simpler usage, and doesn't use as many resources. I'm using the un-/compressor provided by .Net, and simply passing a FileInfo/string and a ref MemoryStream. I'm passing multiple thousand files, sometimes several hundred megabytes in size, and I can't afford to wait for the GC to come along and clear all those objects - I'd rather have one stream, and re-use it.

I've tested both SharpCompress' methods, and mine, and so far mine has proven to be less resource-hungry in our scenarios.

Also, confirmed it's not to do with my uncompressor. No matter how I extract the tar archive, it always shows empty within the objects. TotalSize is also apparently null.

The thing that disturbs me the most at this point, is these tar archives were generated with SharpCompress in the first place.

EDIT: As you're likely curious: method prototype for my uncompressor: UncompressGzFile(file.FullName, ref memStream);

adamhathcock commented 7 years ago

I've been toying with using the framework streams depending on platform. Should be easy to do.

The tar implementation isn't robust. However, nothing immediately comes to mind about the problem area other than the a possible size issue and dealing with int/long or something like that.

SimonCahill commented 7 years ago

--snip-- Snippet uploaded to pastebin: https://pastebin.com/k4DtdFtr

That's my uncompressing code. You could change the parameters to decide whether to compress or decompress a stream. It's fairly simple, but very effective.

I think I've narrowed the issue down to the position of the stream somehow moving to the end. I'm just about to run another test.

SimonCahill commented 7 years ago

I've saved the extracted tar file to the disk, and opened it in a hex editor, I'm not too familiar with tar headers, but this seems almost correct: tar_header

EDIT: After exiting the application, opening again, the file opens and has entries. I have a feeling this is going to get somewhat more annoying to fix. From what I can gather now, it seems as though the stream itself is not outputting everything it should.

SimonCahill commented 7 years ago

I seem to have fixed the issue. Somehow, creating a new memory stream with the buffer of the "old" memory stream has solved the issue. Perplexing, seeming as the memory streams are virtually identical.

For others observing this behavior, first attempt to create a new memory stream with the buffer of the previous stream - that has solved this issue for me.