dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.36k stars 177 forks source link

Trouble loading TorchSharp Linux CUDA into F# Interactive #345

Open dsyme opened 3 years ago

dsyme commented 3 years ago

@gbaydin spotted problems with dynamic loading TorchSharp for Linux + CUDA into F# Interactive. This will also hit .NET Notebooks

  1. If there is a #I include-path directive that includes LibTorchSharp and libtorch-cpu-* CPU binaries then they will be preferred and the CUDA load will fail

  2. The TorchSharp native component loader for F#/.NET Interactive has an explicit version number wired into it for matching libtorch packages to look for, this version number hasn't been updated and we should pick it up from the TorchProperties build instead of hard-wiring it into the source https://github.com/dotnet/TorchSharp/blob/decd474288196e8f4119991a69def32f0e106eff/src/TorchSharp/Torch.cs#L17

  3. The diagnostic given when a libtorch CPU backend is loaded during CUDA initialization could be much more detailed. Currently it fails with "System.InvalidOperationException: Torch device type CUDA did not initialise on the current machine.". Instead it could add that a CPU libtorch was loaded, give the location of the native DLLs loaded etc.

We think these together will solve the problem

dsyme commented 3 years ago

Typical failure:

TorchSharp: LoadNativeBackend: Initialising native backend

TorchSharp: LoadNativeBackend: Try loading torch_cuda native component

TorchSharp: LoadNativeBackend: Loading LibTorchSharp

TorchSharp: LoadNativeBackend: Loaded LibTorchSharp, ok = False

TorchSharp: LoadNativeBackend: Native backend not found in application loading TorchSharp directly from packages directory.

TorchSharp: LoadNativeBackend: Trying dynamic load for .NET/F# Interactive by consolidating native libtorch-cuda-11.1-linux-x64-* binaries to /home/gunes/.nuget/packages/torchsharp/0.91.52719/lib/netcoreapp3.1/cuda-11.1...

CopyNativeComponentsIntoSingleDirectory: packagesDir = /home/gunes/.nuget/packages

System.NotSupportedException: The libtorch-cuda-11.1-linux-x64 package version 1.9.0.7 is not restored on this system. If using F# Interactive or .NET Interactive you may need to add a reference to this package, e.g.

    #r "nuget: libtorch-cuda-11.1-linux-x64, 1.9.0.7"

   at TorchSharp.torch.LoadNativeBackend(Boolean useCudaBackend) in TorchSharp.dll:token 0x60001be+0x39a

   at TorchSharp.torch.TryInitializeDeviceType(DeviceType deviceType) in TorchSharp.dll:token 0x60001bf+0x0

   at TorchSharp.torch.InitializeDeviceType(DeviceType deviceType) in TorchSharp.dll:token 0x60001c1+0x0

   at TorchSharp.torch.InitializeDevice(Device device) in TorchSharp.dll:token 0x60001c2+0xa

   at <StartupCode$FSI_0002>.$FSI_0002.main@() in RefEmit_InMemoryManifestModule:token 0x600000b+0x92

Stopped due to error
dsyme commented 3 years ago

To turn on tracing of the load process we used this:

open System.Diagnostics
let tracer = new ConsoleTraceListener()
Trace.Listeners.Add(tracer)

To test out one of the load-native-library probes we used this:

open TorchSharp
open System.Diagnostics
open System.Runtime.InteropServices
let assembly = typeof<torch>.Assembly
let ok, result = NativeLibrary.TryLoad("LibTorchSharp", assembly, System.Nullable())
printfn $"ok = {ok}, result = {result}"
NiklasGustafsson commented 1 year ago

@dsyme -- is this still a problem for DiffSharp?