adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.25k stars 479 forks source link

SharpCompress unable to unzip few tar.gz #128

Closed HrishikeshSingh closed 8 years ago

HrishikeshSingh commented 8 years ago

I am using SharpCompress nuget package to unzip files. There are few files that SharpCompress is unable to unzip. You can get the file from the link https://github.com/isagalaev/highlight.js/archive/8.9.1.tar.gz

Below is the exception that I am getting:

System.ArgumentNullException was unhandled HResult=-2147467261 Message=Value cannot be null. Parameter name: path2 ParamName=path2 Source=mscorlib StackTrace: at System.IO.Path.Combine(String path1, String path2) at SharpCompress.Reader.IReaderExtensions.WriteEntryToDirectory(IReader reader, String destinationDirectory, ExtractOptions options) at Sharpcompress.Program.Main(String[] args) in E:\2015Projects\Sharpcompress\Program.cs:line 24 at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args) at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() InnerException:

The code used to unzip is

        string filePath = @"E:\drop\highlight.js-8.9.1.tar.gz";
         using (Stream stream = File.OpenRead(filePath))
         {
             var reader = ReaderFactory.Open(stream);
             while (reader.MoveToNextEntry())
             {
                 if (!reader.Entry.IsDirectory)
                 {
                     reader.WriteEntryToDirectory(@"E:\drop\archive", ExtractOptions.ExtractFullPath | ExtractOptions.Overwrite);
                 }
             }
         }
adamhathcock commented 8 years ago

Looks like an entry in the tar has a weird or invalid name that SharpCompress isn't handling.

If you could debug further that would be great.

HrishikeshSingh commented 8 years ago

This issue is repro-able with any tar.gz file that i download from github. Way to download. Go to release tab. Download tar.gz file of any of the releases and then try to uncompress it. It will always fail if any tar.gz is downloaded from GitHub. Invalid or weird cannot happen for all. Either Github is adding something extra which is not handled in SharpCompress. I also tried using SharpZipLib nupkg and that works fine, but there is some license issue with that so wanted to use SharpCompress. Could not figure out what was wrong.

adamhathcock commented 8 years ago

I'm not going to have time to debug this. I'd be hardpressed to say that every tar.gz has this problem but maybe it's something my tar implementation hasn't covered that github's tar generation does.

benshoof commented 8 years ago

Tars created by git-archive start with a tar entry with metadata about the commit they're produced from. This entry is named pax_global_header (POSIX global header) and has an entry type of 'g', which is Global Extended Header. SharpCompress doesn't recognize this entry type yet so it fails the IsTarFile() test. All that needs to be done is to add this entry type to SharpCompress.

Add this to the EntryType enum in TarHeader.cs:

GlobalExtendedHeader = (byte) 'g'

This does mean that SharpCompress will extract this entry to a file named pax_global_header so if you don't want that then you'll have to manually exclude it in your code. From the quick sampling I've done, other third party tar implementations behave the same.

While debugging this I came across a more significant SharpCompress bug that this sample archive just happened to trip. I'll write that up separately when I have the time but for now the archive should just work once GlobalExtendedHeader is defined.

adamhathcock commented 8 years ago

Thanks for the info. You rule :)

adamhathcock commented 8 years ago

To be a bit more accurate: thanks for the research :)

I know the tar implementation is not robust and I ought to find another or recode it. Anything you find is good info. Thanks again.

HrishikeshSingh commented 8 years ago

By when can i expect the fix rolled out.