adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.28k stars 480 forks source link

Encoding with ArchiveFactory.WriteToDirectory #597

Open TheRisenPhoenix opened 3 years ago

TheRisenPhoenix commented 3 years ago

I've ran into an issue where I want to extract a zip archive. Inside the archive, there is a file with german Umlaute: "Übung.txt" I use ArchiveFactory.WriteToDirectory to extract the archive, but the extracted file shows some questionmark instead of "Ü". (so, apparently, the encoding is wrong). I am using Windows 10. The method provides an option-argument, however it does not contain anything encoding-related.

I searched the documentation and came across an example usage of reading the file and extracting it. There, you can provide some reader options:

private static void ExtractArchive(string source, string destination) {
    var opts = new ReaderOptions();
    var encoding = Encoding.GetEncoding(1252);

    opts.ArchiveEncoding = new ArchiveEncoding {
        CustomDecoder = (data, _, _) => encoding.GetString(data),
    };

    using Stream inStream = File.OpenRead(source);
    using Stream outStream = File.OpenWrite(destination);
    using var reader = ReaderFactory.Open(inStream, opts);
    while (reader.MoveToNextEntry())
    {
        if (!reader.Entry.IsDirectory) {
            using var entryStream = reader.OpenEntryStream();
            entryStream.CopyTo(outStream);
        }
    }
 }

I tried various different encodings, but none worked.

Am I doing something incorrectly? Or might this be a bug? Also, I'm wondering what the reason is that ReaderFactory.Open has an option-argument that provides encoding information, but ArchiveFactory.WriteToDirectory doesn't?

adamhathcock commented 3 years ago

There could be several things going wrong here. Encoding within the archive and/or encoding at the code level once it makes a string. I'm not the best with encodings so I'm not sure. I'd need a sample file to see more.

WriteToDirectory is an extension method that's just a helper. It's not meant to cover all scenarios.

TheRisenPhoenix commented 3 years ago

Thanks for your reply!

I prepared a sample file, I hope that it helps you to track down the issue.

Is there anything more I can do to spot the problem?