Open tjoneseng opened 11 years ago
This is definitely interesting... Have you tired this on different ICU versions to see if it's any different?
CharlockHolmes::Converter.convert content, "IBM420_ltr", 'UTF-8' i got a error ArgumentError (U_FILE_ACCESS_ERROR):
If you have built libicu with the --with-data-packaging=files
configuration option, you may need to set the ICU_DATA
environment variable. See http://userguide.icu-project.org/icudata for more details.
We ran into the same issue at GitLab: https://gitlab.com/gitlab-org/gitlab-ce/issues/17415#note_13867854
@stanhu
Seems like I'm the only idiot that doesn't know where I find the f**** data directory. What the hell do I set the env variable to? Searching for this directory for hours now...
@phillipp What platform are you using? On Ubuntu 16.04, I see it in /usr/lib/x86_64-linux-gnu
:
$ dpkg -L libicu55
/.
/usr
/usr/share
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/libicu55
/usr/share/doc
/usr/share/doc/libicu55
/usr/share/doc/libicu55/copyright
/usr/share/doc/libicu55/changelog.Debian.gz
/usr/share/doc/libicu55/NEWS.Debian.gz
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libicui18n.so.55.1
/usr/lib/x86_64-linux-gnu/libicutest.so.55.1
/usr/lib/x86_64-linux-gnu/libiculx.so.55.1
/usr/lib/x86_64-linux-gnu/libicutu.so.55.1
/usr/lib/x86_64-linux-gnu/libicudata.so.55.1
/usr/lib/x86_64-linux-gnu/libicule.so.55.1
/usr/lib/x86_64-linux-gnu/libicuuc.so.55.1
/usr/lib/x86_64-linux-gnu/libicuio.so.55.1
/usr/lib/x86_64-linux-gnu/libicudata.so.55
/usr/lib/x86_64-linux-gnu/libicui18n.so.55
/usr/lib/x86_64-linux-gnu/libicuuc.so.55
/usr/lib/x86_64-linux-gnu/libicutest.so.55
/usr/lib/x86_64-linux-gnu/libicule.so.55
/usr/lib/x86_64-linux-gnu/libiculx.so.55
/usr/lib/x86_64-linux-gnu/libicuio.so.55
/usr/lib/x86_64-linux-gnu/libicutu.so.55
On MacOS:
$ brew list icu4c
/usr/local/Cellar/icu4c/59.1/bin/derb
/usr/local/Cellar/icu4c/59.1/bin/genbrk
/usr/local/Cellar/icu4c/59.1/bin/gencfu
/usr/local/Cellar/icu4c/59.1/bin/gencnval
/usr/local/Cellar/icu4c/59.1/bin/gendict
/usr/local/Cellar/icu4c/59.1/bin/genrb
/usr/local/Cellar/icu4c/59.1/bin/icu-config
/usr/local/Cellar/icu4c/59.1/bin/icuinfo
/usr/local/Cellar/icu4c/59.1/bin/makeconv
/usr/local/Cellar/icu4c/59.1/bin/pkgdata
/usr/local/Cellar/icu4c/59.1/bin/uconv
/usr/local/Cellar/icu4c/59.1/include/unicode/ (175 files)
/usr/local/Cellar/icu4c/59.1/lib/libicudata.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/libicui18n.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/libicuio.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/libicutest.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/libicutu.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/libicuuc.59.1.dylib
/usr/local/Cellar/icu4c/59.1/lib/icu/ (4 files)
/usr/local/Cellar/icu4c/59.1/lib/pkgconfig/ (3 files)
/usr/local/Cellar/icu4c/59.1/lib/ (18 other files)
/usr/local/Cellar/icu4c/59.1/sbin/escapesrc
/usr/local/Cellar/icu4c/59.1/sbin/genccode
/usr/local/Cellar/icu4c/59.1/sbin/gencmn
/usr/local/Cellar/icu4c/59.1/sbin/gennorm2
/usr/local/Cellar/icu4c/59.1/sbin/gensprep
/usr/local/Cellar/icu4c/59.1/sbin/icupkg
/usr/local/Cellar/icu4c/59.1/share/icu/ (4 files)
/usr/local/Cellar/icu4c/59.1/share/man/ (14 files)
@stanhu Ouch, I though it would be some kind of data file, not a lib and looked in /usr/share. Thanks for the help!
I just got bitten by this, too. I believe that icu
can't handle short strings well.
CharlockHolmes::EncodingDetector.detect("Esha")
=> {:type=>:text,
:encoding=>"IBM424_ltr",
:ruby_encoding=>"binary",
:confidence=>60,
:language=>"he"}
My solution here is to grab a larger section of the work, analyze and convert all at once and hope for the best. It has, so far, helped.
Here's some sample code from a CSV file uploader...
def initialize(http_uploaded_file)
http_uploaded_file.to_io.binmode
detection = CharlockHolmes::EncodingDetector.detect(http_uploaded_file.read)
http_uploaded_file.rewind
@text = CharlockHolmes::Converter.convert(http_uploaded_file.read, detection[:encoding], 'UTF-8')
end
Good day.
I have Ubuntu 18.04. When using gem with the operation CharlockHolmes :: Converter.convert content, "IBM420_ltr", 'UTF-8', I get the error U_FILE_ACCESS_ERROR. I use Vagrant. ICU 60.02 was already installed, how it was built, I don `t know (--with-data-packaging = files?). I read the comments I decided to install ICU_DATA.
vagrant@rails-dev-box:/vagrant/UploadFiles$ locate "icu"
/usr/lib/x86_64-linux-gnu/libicudata.so.60
/usr/lib/x86_64-linux-gnu/libicudata.so.60.2
/usr/lib/x86_64-linux-gnu/libicui18n.so.60
/usr/lib/x86_64-linux-gnu/libicui18n.so.60.2
/usr/lib/x86_64-linux-gnu/libicuio.so.60
/usr/lib/x86_64-linux-gnu/libicuio.so.60.2
/usr/lib/x86_64-linux-gnu/libicutest.so.60
/usr/lib/x86_64-linux-gnu/libicutest.so.60.2
/usr/lib/x86_64-linux-gnu/libicutu.so.60
/usr/lib/x86_64-linux-gnu/libicutu.so.60.2
/usr/lib/x86_64-linux-gnu/libicuuc.so.60
/usr/lib/x86_64-linux-gnu/libicuuc.so.60.2
/usr/share/doc/libicu60
/usr/share/doc/libicu60/changelog.Debian.gz
/usr/share/doc/libicu60/copyright
/usr/share/lintian/overrides/libicu60
/usr/src/linux-headers-4.15.0-36/include/dt-bindings/interrupt-controller/mvebu-icu.h
/var/lib/dpkg/info/libicu60:amd64.list
/var/lib/dpkg/info/libicu60:amd64.md5sums
/var/lib/dpkg/info/libicu60:amd64.shlibs
/var/lib/dpkg/info/libicu60:amd64.triggers
From the comment above, i decided to install ICU_DATA=/usr/lib/x86_64-linux-gnu/ . It did not help. Also installing /usr/share/icu/ and /usr/share/icu/60.2/ did not help either. I added the export ICU_DATA=/usr/lib/x86_64-linux-gnu/ in to the /etc/environment file. By command env ICU_DATA can be seen.
I also tried it on ubunte 16.04. There, following the instructions, before installing the gem, I ran apt-get install libicu-dev. Gem was installed but the error remained. I tried running the command sudo bundle config build.charlock_holmes --with-icu-lib=/usr/lib/x86_64-linux-gnu/ or --with-icu-dir=/usr/lib/x86_64-linux-gnu/
The error goes on the string 35 characters.
I am a novice. I do not understand, ICU_DATA here the path to which files should be? or here --with-icu-dir.? Path to which files? or how to properly reinstall ICU?
I don't know if this is a charlock_holmes issue or an ICU issue. For some reason the magic string "jobs" causes an explosion. I have no idea why it would insist that is IBM420_ltr (which I've never even heard of before) but the singular version is not:
I tried giving it a hint but no-go: