Closed abicky closed 6 months ago
I checked simple compress and decompress.
require 'zstd-ruby'
expected_data = "あ"
data = Zstd.decompress(Zstd.compress(expected_data)).force_encoding('UTF-8')
puts <<~MSG
RUBY_VERSION: #{RUBY_VERSION}
data: #{data}
data == expected_data: #{data == expected_data}
data.equal?(expected_data): #{data.equal?(expected_data)}
data.hash: #{data.hash}
expected_datadata.hash: #{expected_data.hash}
{ expected_data => 1 }.has_key?(data): #{{ expected_data => 1 }.has_key?(data)}
MSG
RUBY_VERSION: 3.3.1
data: あ
data == expected_data: true
data.equal?(expected_data): false
data.hash: -3461927809926074668
expected_datadata.hash: -3461927809926074668
{ expected_data => 1 }.has_key?(data): true
I delve deeper into it.
irb(main):025> Zstd.decompress([COMPRESSED_DATA_HEX].pack('H*')).force_encoding('UTF-8').codepoints
=> [227, 129, 130]
irb(main):026> "あ".codepoints
=> [12354]
irb(main):027> Zstd.decompress(Zstd.compress("あ")).force_encoding('UTF-8').codepoints
=> [12354]
@SpringMT Thank you for your quick response! My colleague asked a question on ruby-jp Slack (cf. https://ruby-jp.slack.com/archives/CLWSHA76V/p1716424178458799) and mame san found out that String#ascii_only?
of the decompressed string returned true unexpectedly:
require 'zstd-ruby'
COMPRESSED_DATA_HEX = '28B52FFD0058180000E38182010000'
data = Zstd.decompress([COMPRESSED_DATA_HEX].pack('H*')).force_encoding('UTF-8')
expected_data = 'あ'
puts <<~MSG
RUBY_VERSION: #{RUBY_VERSION}
data: #{data}
data == expected_data: #{data == expected_data}
data.equal?(expected_data): #{data.equal?(expected_data)}
data.hash: #{data.hash}
expected_data.hash: #{expected_data.hash}
data.ascii_only?: #{data.ascii_only?}
expected_data.ascii_only?: #{expected_data.ascii_only?}
{ expected_data => 1 }.has_key?(data): #{{ expected_data => 1 }.has_key?(data)}
MSG
RUBY_VERSION: 3.3.1
data: あ
data == expected_data: true
data.equal?(expected_data): false
data.hash: 2035519590434718668
expected_data.hash: -25190453235001085
data.ascii_only?: true
expected_data.ascii_only?: false
{ expected_data => 1 }.has_key?(data): false
https://github.com/SpringMT/zstd-ruby/pull/90 should be able to fix it.
I released 1.5.6.6 https://rubygems.org/gems/zstd-ruby/versions/1.5.6.6. Please test it 🙇
Awesome! I highly appreciate your support 🙇
We encountered a strange problem where we could not look up the value of a
Hash
whose key was generated byZstd.decompress
using the same multibyte string literal. I found that the problem can be reproduced only if the compressed data doesn't have the Frame_Content_Size information, that is,decompress_buffered
is used.I'm not sure if it is a bug of Ruby or zstd-ruby.
Here is the reproducible code:
Here is the output:
As you can see,
{ expected_data => 1 }.has_key?(data)
is false even thoughdata == expected_data
is true.In Ruby 3.2.2, the result is expected.