brianmario / charlock_holmes

Character encoding detection, brought to you by ICU
MIT License
1.04k stars 142 forks source link

Jruby? #70

Open ylluminate opened 10 years ago

ylluminate commented 10 years ago

Any suggestions for getting this working with jruby? I see a fork that is quite dated (back at v0.1.2) for jruby. I am presently attempting to prepare GitLab for deployment to a TorqueBox instance, but it's griping about needing this gem at 0.6.9.4 and of course provided with a non c extension.

brianmario commented 10 years ago

I think we'd probably need some FFI bindings or something to make this work with jRuby. I won't have time to tackle that for a while, would you be interested in taking a stab at it?

headius commented 7 years ago

Just ran into this trying to run Mastodon :-)

I've only looked over the extension briefly, but it seems like the functionality provided here has got to exist somewhere in the JDK's Charset API.

brianmario commented 7 years ago

@headius I'd feel a lot better if we found some way to link to libicu for jRuby, only so the detection and transcoding results are consistent across all of the Ruby platforms this gem would be used on. But if that's going to be a huge pain, not performant enough, or just not possible - I think I'd be open to other options assuming we can support the same API footprint already in this library.

What's the story with C bindings in jRuby these days?

headius commented 7 years ago

@brianmario It shouldn't be too hard to use FFI for this. I assume you are not doing anything magic to install or vendor libicu, so if it's present on the host system then we just need to wrap a few methods.

Now, the issue with this is that FFI is going to copy strings going into ICU, since on the JVM we can't provide direct pointers to them. So that will be a cost MRI does not have.

I did find this interesting nugget: http://userguide.icu-project.org/usefrom/jni

ICU does support a standard JNI binding, which we could call from JRuby like any other Java library. This may involve some complexity to get pre-build JNI binaries for various platforms, but it's another possible path. Performance-wise, it might be faster than FFI, or it might be about the same.

brodock commented 4 years ago

could ICU4J be used for the JRuby version here?

http://site.icu-project.org/home/why-use-icu4j

headius commented 4 years ago

@brodock It probably could, and without writing a line of Java (though we could move some code into Java later for perf is necessary.