Open mklaber opened 7 years ago
@adamhathcock I have been unable to figure out a good way to fix the TarArchive.IsTarFile detection. As part of the proposed fix I moved IsTarFile to the end of the Open. The TarHeader will sometimes accept a file as a Tar in a compressed stream, gz, bz2 etc even when it is not. What are your thoughts on adding an option to ReaderOptions, like TryOpenArchiveInStream? Then make the Open call recursive on a compressed steams. If this is an acceptable solution I am quite happy to create a PR.
public static IReader Open(Stream stream, ReaderOptions options = null)
{
stream.CheckNotNull("stream");
options = options ?? new ReaderOptions()
{
LeaveStreamOpen = false
};
RewindableStream rewindableStream = new RewindableStream(stream);
rewindableStream.StartRecording();
if (ZipArchive.IsZipFile(rewindableStream, options.Password))
{
rewindableStream.Rewind(true);
return ZipReader.Open(rewindableStream, options);
}
rewindableStream.Rewind(false);
if (GZipArchive.IsGZipFile(rewindableStream))
{
rewindableStream.Rewind(false);
GZipStream decompressedStream = new GZipStream(rewindableStream, CompressionMode.Decompress);
if (options.TryOpenArchiveInStream)
{
try { return Open(decompressedStream, options); }
catch (InvalidOperationException) { }
}
rewindableStream.Rewind(true);
return GZipReader.Open(rewindableStream, options);
}
rewindableStream.Rewind(false);
if (BZip2Stream.IsBZip2(rewindableStream))
{
rewindableStream.Rewind(false);
BZip2Stream decompressedStream = new BZip2Stream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress, false);
if (options.TryOpenArchiveInStream)
{
try { return Open(decompressedStream, options); }
catch (InvalidOperationException) { }
}
}
rewindableStream.Rewind(false);
if (LZipStream.IsLZipFile(rewindableStream))
{
rewindableStream.Rewind(false);
LZipStream decompressedStream = new LZipStream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress);
if (options.TryOpenArchiveInStream)
{
try { return Open(decompressedStream, options); }
catch (InvalidOperationException) { }
}
}
rewindableStream.Rewind(false);
if (RarArchive.IsRarFile(rewindableStream, options))
{
rewindableStream.Rewind(true);
return RarReader.Open(rewindableStream, options);
}
rewindableStream.Rewind(false);
if (XZStream.IsXZStream(rewindableStream))
{
rewindableStream.Rewind(true);
XZStream decompressedStream = new XZStream(rewindableStream);
if (options.TryOpenArchiveInStream)
{
try { return Open(decompressedStream, options); }
catch (InvalidOperationException) { }
}
}
rewindableStream.Rewind(false);
if (TarArchive.IsTarFile(rewindableStream))
{
rewindableStream.Rewind(true);
return TarReader.Open(rewindableStream, options);
}
throw new InvalidOperationException("Cannot determine compressed stream type. Supported Reader Formats: Zip, GZip, BZip2, Tar, Rar, LZip, XZ");
}
Excel's xlsx format is really just a Zipped XML file. If such files are gzipped, the
ReaderFactory
seems to try to un-gzip and then un-zip the content. This leads to anIEntry.Key
of the first parts of the file rather than the name of the file.To reproduce:
gzip -k Book1.xlsx
ReaderFactory
:The
Entry.Key
value that is dumped isPK ! A7��n [Content_Types].xml �(�
I'd expect the
Key
to be the file nameBook1.xlsx
(or at least not the first lines of the file).Open to other suggestions on how it should work but as it stands I'd have to special case for
*.xlsx.gz
files which seems to defeat the purpose of a generalReaderFactory
that can handle any of the supported formats you throw at it.Book1.xlsx Book1.xlsx.gz
Update: it looks like the underlying issue is that
ReaderFactory
's call toTarArchive.IsTarFile
returns true for *.xlsx files: https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L48