Closed SimonCahill closed 7 years ago
I'm not going to be able to solve this without a sample file.
I recommend trying to debug this yourself as Tar is a simple format. Maybe at least to give a hint about what's going wrong.
Sadly I can't give you the files in question, otherwise I would have (privacy and confidentiality). I'm currently attempting to debug the solution myself. This issue was more a post to mention an issue with the library, and possibly add a solution for others.
Currently, I'm determining whether an issue occurred with my GZ-uncompressor (unlikely, as it works fine with other file types, but then again, unlikely doesn't mean impossible).
Why are you using a separate GZ decompressor rather than SharpCompress's? I'm asking as I'm always looking to improve implementations.
The method I implemented has simpler usage, and doesn't use as many resources. I'm using the un-/compressor provided by .Net, and simply passing a FileInfo/string and a ref MemoryStream. I'm passing multiple thousand files, sometimes several hundred megabytes in size, and I can't afford to wait for the GC to come along and clear all those objects - I'd rather have one stream, and re-use it.
I've tested both SharpCompress' methods, and mine, and so far mine has proven to be less resource-hungry in our scenarios.
Also, confirmed it's not to do with my uncompressor. No matter how I extract the tar archive, it always shows empty within the objects. TotalSize is also apparently null.
The thing that disturbs me the most at this point, is these tar archives were generated with SharpCompress in the first place.
EDIT: As you're likely curious: method prototype for my uncompressor: UncompressGzFile(file.FullName, ref memStream);
I've been toying with using the framework streams depending on platform. Should be easy to do.
The tar implementation isn't robust. However, nothing immediately comes to mind about the problem area other than the a possible size issue and dealing with int/long or something like that.
--snip-- Snippet uploaded to pastebin: https://pastebin.com/k4DtdFtr
That's my uncompressing code. You could change the parameters to decide whether to compress or decompress a stream. It's fairly simple, but very effective.
I think I've narrowed the issue down to the position of the stream somehow moving to the end. I'm just about to run another test.
I've saved the extracted tar file to the disk, and opened it in a hex editor, I'm not too familiar with tar headers, but this seems almost correct:
EDIT: After exiting the application, opening again, the file opens and has entries. I have a feeling this is going to get somewhat more annoying to fix. From what I can gather now, it seems as though the stream itself is not outputting everything it should.
I seem to have fixed the issue. Somehow, creating a new memory stream with the buffer of the "old" memory stream has solved the issue. Perplexing, seeming as the memory streams are virtually identical.
For others observing this behavior, first attempt to create a new memory stream with the buffer of the previous stream - that has solved this issue for me.
I have a simple re-index functionality, and some older archives (application structure) are stored as .tar.gz archives. The tars are unzipped fine, and they are detected as .tar files within the application, however no entries are found!
I've opened the file in WinRAR, and there most definitely are files within the archive, and they can be extracted with WinRAR.
I'm using VS 2017, C#7 with .Net framework 4.6, Windows 10 and Mono 4.6