d-unsed / ruru

Native Ruby extensions written in Rust
MIT License
832 stars 40 forks source link

RString default encoding is ASCII-8Bit #67

Open mjc-gh opened 7 years ago

mjc-gh commented 7 years ago

I made a small example to show the "issue". It is available here: https://github.com/mikeycgto/ruru-rstring-encoding#example.

While this isn't incorrect it does feel a little odd since in modern Ruby and Rust both default to UTF-8 encoding.

Perhaps ruby-sys should expose rb_utf8_str_new as well? The RString struct could then maybe have a new_from_utf8 function which will ultimately call this Ruby C function. I can make the necessary PRs but I wanted some feedback first.

mjc-gh commented 7 years ago

Created two commits to deal with this issue:

d-unsed commented 7 years ago

71 will be merged after we release new version of ruby-sys

mjc-gh commented 7 years ago

Any update on releasing a new version of ruby-sys with steveklabnik/ruby-sys/pull/23 merged? Thanks!

steveklabnik commented 7 years ago

I've published ruby-sys 0.3.0

danielpclark commented 7 years ago

On encoding I'd be interested if we could implement the same encoding support that Ruby has. I'm playing around with some ideas. The main issue to test against is this test https://github.com/ruby/spec/blob/29b472cf57946ce271533fc6e03716908a313aaf/core/file/basename_spec.rb#L162-L166 which simply tests the same encoding goes in and out.

turboladen commented 6 years ago

I suppose this could be closed per #71, yeah?

d-unsed commented 6 years ago

I have one more question regarding encodings. Since Ruby by default creates a string with UTF-8 encoding ('str'.encoding => #<Encoding:UTF-8>), what do you think about changing the default behavior on the ruru side?

I mean the following steps:

  1. Create a new method RString::new_ascii which returns ascii-encoded string (behaves the same as current RString::new using rb_str_new)

  2. Change RString::new to return a UTF8-encoded string (behaves the same as current RString::new_utf8 using rb_utf8_str_new) and remove RString::new_utf8.

Then ruru and ruby will have the same default behavior. This can be implemented in 0.10.0

danielpclark commented 6 years ago

and remove RString::new_utf8

I think changing the RString::new method to expected behavior is a good idea. But I think you should keep the RString::new_utf8 method around as it will help people who want to be more explicit with their code. Anyone currently using RString::new_utf8 will be better off if it stays.

mjc-gh commented 6 years ago

I agree with making the default for RString::new be a UTF8 string since that's how Ruby behaves.

I think keeping RString::new_utf8 around is good idea as well. Perhaps we could also add RString::new_ascii which would returns an a ASCII-8bit encode string.

turboladen commented 6 years ago

I agree with making the default for RString::new be a UTF8 string since that's how Ruby behaves.

This totally makes sense to me.