garbled when making tar with non-usascii filename.

itn3000 commented 5 years ago

When I made the tar with non-usascii filename(in my case, it's Japanese), non-usascii character was garbled. here is my testcase code(xunit)

        [Fact]
        public void Tar_Japanese_FileName()
        {
            var data = new byte[1];
            // this is Japanese 'あ'
            var fname = new string(new char[1]{ (char)0x3042 });
            using (var mstm = new MemoryStream())
            {
                var opts = new TarWriterOptions(CompressionType.None);
                using (var tw = new TarWriter(mstm, opts))
                {
                    using (var dstm = new MemoryStream(data))
                    {
                        tw.Write(fname, dstm, System.DateTime.Now);
                    }
                }
                using(var mstm2 = new MemoryStream(mstm.ToArray()))
                {
                    var ropts = new SharpCompress.Readers.ReaderOptions();
                    ropts.ArchiveEncoding.Default = System.Text.Encoding.UTF8;
                    using(var tr = new SharpCompress.Readers.Tar.TarReader(mstm2, ropts, CompressionType.None))
                    {
                        Assert.True(tr.MoveToNextEntry());
                        // test was failed, expected = 0x3042, actual = 0x42
                        Assert.Equal(fname, tr.Entry.Key);
                    }
                }
            }
        }

KevinErath commented 5 years ago

I have the same issue with German Umlauts like ö. I tracked the issue down to the class TarHeader. Changing how the Write()-Method handles the filename seems to fix the issue. Here is a quick sample i hacked together:

        internal void Write(Stream output)
        {
            byte[] buffer = new byte[BLOCK_SIZE];

            WriteOctalBytes(511, buffer, 100, 8); // file mode
            WriteOctalBytes(0, buffer, 108, 8); // owner ID
            WriteOctalBytes(0, buffer, 116, 8); // group ID

            //ArchiveEncoding.UTF8.GetBytes("magic").CopyTo(buffer, 257);

            byte[] nameBytes = Encoding.UTF8.GetBytes(Name);

            if (nameBytes.Length > 100)
            {
                // Set mock filename and filetype to indicate the next block is the actual name of the file
                WriteStringBytes("././@LongLink", buffer, 0, 100);
                buffer[156] = (byte)EntryType.LongName;
                WriteOctalBytes(Name.Length + 1, buffer, 124, 12);
            }
            else
            {
                WriteBytes(nameBytes, buffer, 0, 100);

                WriteOctalBytes(Size, buffer, 124, 12);
                var time = (long)(LastModifiedTime.ToUniversalTime() - EPOCH).TotalSeconds;
                WriteOctalBytes(time, buffer, 136, 12);
                buffer[156] = (byte)EntryType;

                if (Size >= 0x1FFFFFFFF)
                {
                    byte[] bytes = DataConverter.BigEndian.GetBytes(Size);
                    var bytes12 = new byte[12];
                    bytes.CopyTo(bytes12, 12 - bytes.Length);
                    bytes12[0] |= 0x80;
                    bytes12.CopyTo(buffer, 124);
                }
            }

            int crc = RecalculateChecksum(buffer);
            WriteOctalBytes(crc, buffer, 148, 8);

            output.Write(buffer, 0, buffer.Length);

            if (nameBytes.Length > 100)
            {
                WriteLongFilenameHeader(output);
                Name = Name.Substring(0, 100);
                Write(output);
            }
        }

        private static void WriteBytes(byte[] source, byte[] buffer, int offset, int length)
        {
            int i;

            for (i = 0; i < length && i < source.Length; ++i)
            {
                buffer[offset + i] = source[i];
            }

            for (; i < length; ++i)
            {
                buffer[offset + i] = 0;
            }
        }

It would be awesome if this would be fixed soon.

adamhathcock commented 5 years ago

The Tar implementation is kind of mess, imo. I just don't have time nor the mental energy to redo it all by myself because of personal reasons.

If you could make a real PR for your fix, I'd take a look. Thanks!

adamhathcock / sharpcompress

garbled when making tar with non-usascii filename. #414