ankane / torch.rb

Deep learning for Ruby, powered by LibTorch
Other
704 stars 30 forks source link

Undefined symbol: _ZN5torch5utils14cuda_lazy_initEv when running on CUDA #16

Closed golirev closed 3 years ago

golirev commented 3 years ago

When running in a CUDA environment, the following error occurred.

(py37) tsuyoshi@Jupiter:~/work/ruby/torch.rb$ bundle exec ruby examples/mnist/main.rb
Device type: cuda
/home/tsuyoshi/.rvm/rubies/ruby-2.7.2/bin/ruby: symbol lookup error: /home/tsuyoshi/work/ruby/torch.rb/lib/torch/ext.so: undefined symbol: _ZN5torch5utils14cuda_lazy_initEv`

Looking for the symbol, it seems to be in libtorch_python.so, but when I link it, it depends on Python. As a test, I removed the call to the function that would call this symbol and it worked.

(py37) tsuyoshi@Jupiter:~/work/ruby/torch.rb$ git diff
diff --git a/codegen/generate_functions.rb b/codegen/generate_functions.rb
index dfec6e2..79486b5 100644
--- a/codegen/generate_functions.rb
+++ b/codegen/generate_functions.rb
@@ -290,7 +290,7 @@ def generate_tensor_options(function, opt_params)
     code += "\n      .#{c}"
   end

-  "#{code};\n  torch::utils::maybe_initialize_cuda(options);"
+  "#{code};"
 end

 def generate_function_code(function, cpp_name, params, opt_index, remove_self)

Ubuntu-18.04 CUDA 10.1 libtorch-cxx11-abi-shared-with-deps-1.7.0+cu101.zip Ruby 2.7.2 torch.rb-0.5.1

ankane commented 3 years ago

Hey @golirev, thanks for reporting and great investigative work! It looks like that function is Python-specific/not needed. I've confirmed it fixes it and will push out a new release shortly.