ankane / tokenizers-ruby

Fast state-of-the-art tokenizers for Ruby
Apache License 2.0
132 stars 6 forks source link

I get an error when installing #3

Closed kojix2 closed 2 years ago

kojix2 commented 2 years ago

Hello! Thanks for making this gem.

But it seems to fail to install in my environment.

gem install tokenizers

I get the following error message

Building native extensions. This could take a while...
ERROR:  Error installing tokenizers:
    ERROR: Failed to build gem native extension.

    current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
/home/kojix2/.rbenv/versions/3.1.2/bin/ruby -I /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/3.1.0 -r ./siteconf20220909-19701-2a0rv0.rb extconf.rb

current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\= clean
make: 'clean' に対して行うべき事はありません. # There is nothing to do for 'clean'. (@kojix2)

current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\=
cargo build --release --target-dir target
   Compiling libc v0.2.121
   Compiling cfg-if v1.0.0
   Compiling autocfg v1.1.0
   Compiling cc v1.0.73
   Compiling pkg-config v0.3.24
   Compiling proc-macro2 v1.0.36
   Compiling unicode-xid v0.2.2
   Compiling syn v1.0.89
   Compiling memchr v2.3.4
   Compiling lazy_static v1.4.0
   Compiling log v0.4.14
   Compiling version_check v0.9.4
   Compiling pin-project-lite v0.2.8
   Compiling bitflags v1.3.2
   Compiling bytes v1.1.0
   Compiling futures-core v0.3.21
   Compiling once_cell v1.10.0
   Compiling itoa v1.0.1
   Compiling futures-task v0.3.21
   Compiling typenum v1.15.0
   Compiling crossbeam-utils v0.8.8
   Compiling serde_derive v1.0.136
   Compiling serde v1.0.136
   Compiling foreign-types-shared v0.1.1
   Compiling fnv v1.0.7
   Compiling futures-util v0.3.21
   Compiling openssl v0.10.38
   Compiling ryu v1.0.9
   Compiling pin-utils v0.1.0
   Compiling unicode-width v0.1.9
   Compiling hashbrown v0.11.2
   Compiling native-tls v0.2.8
   Compiling futures-io v0.3.21
   Compiling slab v0.4.5
   Compiling futures-channel v0.3.21
   Compiling futures-sink v0.3.21
   Compiling tinyvec_macros v0.1.0
   Compiling matches v0.1.9
   Compiling httparse v1.6.0
   Compiling crc32fast v1.3.2
   Compiling radium v0.5.3
   Compiling percent-encoding v2.1.0
   Compiling adler v1.0.2
   Compiling strsim v0.9.3
   Compiling getrandom v0.1.16
   Compiling try-lock v0.2.3
   Compiling ident_case v1.0.1
   Compiling scopeguard v1.1.0
   Compiling openssl-probe v0.1.5
   Compiling ppv-lite86 v0.2.16
   Compiling regex-syntax v0.6.25
   Compiling rayon-core v1.9.1
   Compiling either v1.6.1
   Compiling lexical-core v0.7.6
   Compiling httpdate v1.0.2
   Compiling encoding_rs v0.8.30
   Compiling tower-service v0.3.1
   Compiling unicode-bidi v0.3.7
   Compiling static_assertions v1.1.0
   Compiling wyz v0.2.0
   Compiling tap v1.0.1
   Compiling serde_json v1.0.79
   Compiling funty v1.1.0
   Compiling byteorder v1.4.3
   Compiling arrayvec v0.5.2
   Compiling cpufeatures v0.2.2
   Compiling derive_builder v0.9.0
   Compiling ipnet v2.4.0
   Compiling fastrand v1.7.0
   Compiling remove_dir_all v0.5.3
   Compiling mime v0.3.16
   Compiling number_prefix v0.4.0
   Compiling base64 v0.13.0
   Compiling unicode-segmentation v1.9.0
   Compiling glob v0.3.0
   Compiling base64 v0.12.3
   Compiling number_prefix v0.3.0
   Compiling macro_rules_attribute-proc_macro v0.0.2
   Compiling vec_map v0.8.2
   Compiling strsim v0.8.0
   Compiling rutie v0.8.4
   Compiling ansi_term v0.12.1
   Compiling smallvec v1.8.0
   Compiling unicode_categories v0.1.1
   Compiling paste v1.0.6
   Compiling tracing-core v0.1.23
   Compiling memoffset v0.6.5
   Compiling indexmap v1.8.0
   Compiling miniz_oxide v0.4.4
   Compiling crossbeam-epoch v0.9.8
   Compiling rayon v1.5.1
   Compiling generic-array v0.14.5
   Compiling nom v6.2.1
   Compiling foreign-types v0.3.2
   Compiling http v0.2.6
   Compiling textwrap v0.11.0
   Compiling tinyvec v1.5.1
   Compiling openssl-sys v0.9.72
   Compiling bzip2-sys v0.1.11+1.0.8
   Compiling onig_sys v69.7.1
   Compiling esaxx-rs v0.1.7
   Compiling form_urlencoded v1.0.1
   Compiling itertools v0.8.2
   Compiling itertools v0.9.0
   Compiling macro_rules_attribute v0.0.2
   Compiling unicode-normalization-alignments v0.1.12
   Compiling tracing v0.1.32
   Compiling unicode-normalization v0.1.19
   Compiling aho-corasick v0.7.15
   Compiling num_cpus v1.13.1
   Compiling socket2 v0.4.4
   Compiling getrandom v0.2.5
   Compiling terminal_size v0.1.17
   Compiling time v0.1.43
   Compiling filetime v0.2.15
   Compiling xattr v0.2.2
   Compiling fs2 v0.4.3
   Compiling atty v0.2.14
   Compiling tempfile v3.3.0
   Compiling dirs-sys v0.3.7
   Compiling http-body v0.4.4
   Compiling mio v0.8.2
   Compiling want v0.3.0
   Compiling quote v1.0.16
   Compiling crossbeam-channel v0.5.4
   Compiling bitvec v0.19.6
   Compiling regex v1.4.6
   Compiling idna v0.2.3
   Compiling rand_core v0.6.3
   Compiling rand_core v0.5.1
   Compiling tar v0.4.38
   Compiling clap v2.34.0
   Compiling dirs v3.0.2
   Compiling tokio v1.17.0
   Compiling flate2 v1.0.22
   Compiling block-buffer v0.10.2
   Compiling crypto-common v0.1.3
   Compiling url v2.2.2
   Compiling rand_chacha v0.3.1
   Compiling rand_chacha v0.2.2
   Compiling console v0.15.0
   Compiling bzip2 v0.4.3
   Compiling crossbeam-deque v0.8.1
   Compiling digest v0.10.3
   Compiling rand v0.8.5
   Compiling rand v0.7.3
   Compiling tokio-util v0.6.9
   Compiling indicatif v0.16.2
   Compiling indicatif v0.15.0
   Compiling darling_core v0.10.2
   Compiling onig v6.3.1
   Compiling sha2 v0.10.2
   Compiling tokio-native-tls v0.3.0
   Compiling h2 v0.3.12
   Compiling thiserror-impl v1.0.30
   Compiling darling_macro v0.10.2
   Compiling darling v0.10.2
   Compiling derive_builder_core v0.9.0
   Compiling thiserror v1.0.30
   Compiling zip v0.5.13
   Compiling zip-extensions v0.6.1
   Compiling rayon-cond v0.1.0
   Compiling hyper v0.14.17
   Compiling serde_urlencoded v0.7.1
   Compiling spm_precompiled v0.1.3
   Compiling hyper-tls v0.5.0
   Compiling reqwest v0.11.10
   Compiling cached-path v0.5.3
   Compiling tokenizers v0.11.3
   Compiling tokenizers-ruby v0.1.0 (/home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1)
    Finished release [optimized] target(s) in 1m 22s
mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so

current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\= install
cargo build --release --target-dir target
    Finished release [optimized] target(s) in 0.09s
mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so
mv: 'target/release/libtokenizers.so' と '../../lib/tokenizers/ext.so' は同じファイルです # is the same file (@kojix2)
make: *** [Makefile:3: install] エラー 1 # Error1 (@kojix2)

make install failed, exit code 2

Gem files will remain installed in /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1 for inspection.
Results logged to /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/extensions/x86_64-linux/3.1.0/tokenizers-0.1.1/gem_make.out

But I was able to try it using the developer's method.

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec ruby ext/tokenizers/extconf.rb && make
bundle exec rake download:files
bundle exec rake test

Tried GPT-2 with onnxruntime! It's working just fine!

require "tokenizers"
require "onnxruntime"
require "numo/narray"

tokenizer = Tokenizers.from_pretrained("gpt2")
model = OnnxRuntime::Model.new("gpt2-lm-head-10.onnx")

s = "Why do cats want to ride on the keyboard?"

ids = tokenizer.encode(s).ids

10.times do
  o = model.predict({ input1: [[ids]] })
  o = Numo::DFloat.cast(o["output1"][0])
  ids << o[true, -1, true].argmax
end

puts tokenizer.decode(ids)

:cat2: :keyboard: :question:

Why do cats want to ride on the keyboard?

The answer is that they do.
kojix2 commented 2 years ago

Gitbhub Actions was used to reproduce the problem. https://github.com/kojix2/tokenizers-ruby/runs/8263813372?check_suite_focus=true

name: install-test
on: [push, pull_request]
jobs:
  build:
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest]
    runs-on: ${{ matrix.os }}
    steps:
    - uses: actions-rs/toolchain@v1
      with:
        toolchain: stable
    - uses: ruby/setup-ruby@v1
      with:
        ruby-version: 3.1
        bundler-cache: true
    - run: gem install tokenizers 
ankane commented 2 years ago

Hey @kojix2, thanks for the detailed report and the PR. It looks like it was trying to move the shared library twice (once with make and once with make install). The commit above should fix. Need to push a follow-up commit to fix CI, and will push a new version shortly.

kojix2 commented 2 years ago

Thank you for your quick response!

ankane commented 2 years ago

Success 🎉 https://github.com/ankane/tokenizers-ruby/runs/8264236703?check_suite_focus=true

kojix2 commented 2 years ago

As is expected! I have confirmed that it can be installed successfully. Thank you.