evcxr / evcxr

Other
5.59k stars 216 forks source link

macOS: kernel broken with "tch" dependency #123

Closed maun closed 4 years ago

maun commented 4 years ago

On macOS 10.15.5 the following fails:

:dep tch = "0.1.7"
use tch::Tensor;

With this error:

While processing instruction `Ok("LOAD_AND_RUN /var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_8.dylib run_user_code_7")`, got error: Message("dlopen(/var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_8.dylib, 2): Library not loaded: @rpath/libtorch.dylib\n  Referenced from: /private/var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_8.dylib\n  Reason: image not found")
Child process terminated with status: exit code: 99

The referenced dylib is present:

du -h /var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_8.dylib
 16K    /var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_8.dylib

Afterwards evcxr is broken, executing any cell, even after removing the tch dependency leads to this error with the number increased by one.

% objdump -t  /var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_14.dylib

/var/folders/xq/459wd5fj5sj2lcnrd21xgvth0000gq/T/.tmpp9ExOe/target/debug/deps/libcode_14.dylib: file format Mach-O 64-bit x86-64

SYMBOL TABLE:
0000000000002008 l     O __DATA,__data __dyld_private
0000000000000ed0 g     F __TEXT,__text _bz_internal_error
0000000000000ec0 g     F __TEXT,__text _run_user_code_13
0000000000000000         *UND* __ZN3std9panicking15begin_panic_fmt17h0d20894adb4c9e0eE
0000000000000000         *UND* __ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17he2ccc2060d52a2ceE
0000000000000000         *UND* dyld_stub_binder
(base) manuelp@Hyperion rl % 

Restarting and only removing the use statement works fine.

What can I do to get at the real error?

tch are bindings for pytorch's libtorch c++ library. It is downloaded in the build script.

davidlattimore commented 4 years ago

Thanks for the report. You can get the directory in which it's compiling code, by running :last_compile_dir. If you cd into that directory while evcxr is still running, you can run cargo rustc -- -C prefer-dynamic. That's what evcxr runs to compile the module. Experimenting here might give some clue as to what's going on. e.g. if you switch it to compiling a binary instead of a shared library, can it run?

Unfortunately given the nature of the problem it's likely that switching to compiling a binary will make the problem not reproduce. At that stage, if you're keen, you could try switching back to compiling a shared library and writing a little Rust program to dlopen the shared library. Evcxr uses the libloading crate to open shared objects. So it might look something like the following:

fn main() {
  libloading::Library::new("/somewhere/target/debug/deps/libcode_8.dylib").unwrap();
}

Good luck and let me know if you need any more details on anything.

davidlattimore commented 4 years ago

I'll close this for now, but feel free to reopen if the problem still reproduces and you have any more information

robinbernon commented 3 years ago

Thought I'd offer this info as it may be helpful to some. Had the exact same issue as this but on windows OS. Realised that I only ever got it when running my notebook whilst in nightly mode. The exact same code ran fine when I set it back to stable. Am able to remain working in stable mode so didn't follow up on any further debugging of the nightly issue..