dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.4k stars 182 forks source link

Can't use Linux libtorch packages from .NET Interactive and F# Interactive #169

Closed dsyme closed 3 years ago

dsyme commented 4 years ago

There are problems using LibTorch packages from .NET Interactive and F# Interactive on Linux because the native libraries are not unified into a single directory.

As a workaround you can avoid using the packages and load directly:

open System.Runtime.InteropServices
NativeLibrary.Load("/home/gunes/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch.so")

The problem has also been reported previously on Windows but is currently believed fixed. If not likewise use

open System.Runtime.InteropServices
NativeLibrary.Load(@"D:\libtorch\lib\torch_cuda.dll")

Examples:

#r "nuget: TorchSharp,0.3.52276"
#r "nuget: libtorch-cuda-10.2-win-x64,1.5.6"

or

#r "nuget: TorchSharp,0.3.52276"
#r "nuget: libtorch-cuda-10.2-linux-x64,1.5.6"

The problem is that .NET Interactive and F# Interactive load DLLs directly from package directories, instead of from a collected application directory. For managed DLLs this works OK, but native DLLs do not load transitive dependencies unless load paths are set up.

This is a general issue with the package load process used by .NET/F# Interactive, see https://github.com/dotnet/fsharp/issues/10136. We may be able to workaround the issue here, though it is challenging, for two reasons

  1. There are multiple different runtime native DLLs that work with the same managed DLL - basically CPU and GPU - the end application selects one

  2. The collected native DLLs are too large to fit in one nuget package - they are about 1.5GB for GPU for example. So they must be delivered in multiple packages, because in practice both nuget.org and Azure CI and other things place limits on nuget package size around 200MB.

Together these mean that the native DLLs end up scattered in diferent package directories.

A workaround used in DiffSharp for the CPU case is to force the load of libtorch.so (Linux) or torch_cpu.dll (Windows) before any other loads are requested.

open System.Runtime.InteropServices
open System.IO

let path1 = Path.GetDirectoryName(typeof<DiffSharp.dsharp>.Assembly.Location) 
let path2 = if RuntimeInformation.IsOSPlatform(OSPlatform.Linux) then path1 + "/../../../../libtorch-cpu/1.8.0.7/runtimes/linux-x64/native/libtorch.so" else path1 + "/../../../../libtorch-cpu/1.8.0.7/runtimes/win-x64/native/torch_cpu.dll"
NativeLibrary.Load(path2)

However this workaround doesn't work for the GPU case. Although awkward it is probably worth developing a similar workaround for the GPU case and adding both to the platform initialization logic of TorchSharp - in practice the specially huge nature of the corresponding native binaries makes this library a particular challenge.

There is also the general issue of long download times on first use, which is potentially very significant for notebook startup times on a container (though likely ok if running in a data centre).

dsyme commented 4 years ago

Using this to debug a local build

#i @"nuget: E:\GitHub\dsyme\TorchSharp\bin/packages/Debug";;
#r @"nuget: TorchSharp,0.3.0-local-Debug-20200918";;
TorchSharp.Torch.IsCudaAvailable();;

dotnet artifacts\bin\fsi\Debug\netcoreapp3.1\fsi.exe /langversion:preview < a.fs
fwaris commented 3 years ago

FYI: When using the 'lite version, the following seems to work for GPU.

#r "nuget: DiffSharp-lite,1.0.0-preview-485581354"
System.Runtime.InteropServices.NativeLibrary.Load(@"D:\libtorch\lib\torch_cuda.dll")

Here libtorch binaries where separately downloaded and installed from https://pytorch.org/

dsyme commented 3 years ago

Using a colab GPU-enabled notebook to look into this

  1. Install .NET SDK

    !wget https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb && sudo dpkg -i packages-microsoft-prod.deb && sudo apt-get update && sudo apt-get install -y apt-transport-https && sudo apt-get update && sudo apt-get install -y dotnet-sdk-5.0

    !dotnet --version

  2. Get package

    !echo "printfn \"phase0\"" > foo.fsx !echo "#r \"nuget: DiffSharp-cpu, 1.0.0-preview-681551353\";;" >> foo.fsx !cat foo.fsx !dotnet fsi foo.fsx

  3. Investigate dependencies

    !ls /root/.nuget/packages/libtorch-cpu/1.8.0.7/runtimes/linux-x64/native !ls /root/.nuget/packages/torchsharp/0.91.52458/runtimes/linux-x64/native/ !echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH !ldd /root/.nuget/packages/torchsharp/0.91.52458/runtimes/linux-x64/native/libLibTorchSharp.so

    Reveals /root/.nuget/packages/torchsharp/0.91.52458/runtimes/linux-x64/native/libLibTorchSharp.so: /lib/x86_64-linux-gnu/libpthread.so.0: version GLIBC_2.30 not found (required by /root/.nuget/packages/torchsharp/0.91.52458/runtimes/linux-x64/native/libLibTorchSharp.so)

    This is an Ubuntu 18.04 problem - need to investigate where this dependency is coming from

  4. Try explicit NativeLibrary.Load:

    !echo "printfn \"phase1\"" > foo.fsx !echo "open System.Runtime.InteropServices" >> foo.fsx !echo "NativeLibrary.Load(\"/root/.nuget/packages/libtorch-cpu/1.8.0.7/runtimes/linux-x64/native/libtorch.so\") |> printfn \"%A\";;" >> foo.fsx !echo "NativeLibrary.Load(\"/root/.nuget/packages/torchsharp/0.91.52458/runtimes/linux-x64/native/libLibTorchSharp.so\") |> printfn \"%A\";;" >> foo.fsx !echo "printfn \"phase2\"" >> foo.fsx !echo "DiffSharp.dsharp.devices(backend=DiffSharp.Backend.Torch) |> printfn \"%A\"" >> foo.fsx !cat foo.fsx !dotnet fsi foo.fsx

  5. Hack to update GLIBC_30 on Colab machine

    !echo "deb http://ftp.us.debian.org/debian testing main contrib non-free" >> /etc/apt/sources.list !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 04EE7237B7D453EC !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138 !apt-get update !apt-get install build-essential -y

dsyme commented 3 years ago

This problem is now fixed. I've left a notebook about the investigation in the DiffSharp repo: https://github.com/DiffSharp/DiffSharp/blob/dev/notebooks/debug/NativeCudaLoadLinux.ipynb