Open tombatron opened 1 year ago
I've tried to reproduce this problem with WSL, but I'm running into a very different problem, which doesn't even get as far as calling is_available()
It's worth trying -- and this is a total shot in the dark -- to delete everything *torch*
under ~/.nuget/packages/
and then try again. I wonder if there's some sort of package confusion going on when running with .NET Interactive.
Yeah that didn't seem to have any impact. :\
Here is a directory listing of my .nuget directory on the Jupyter server:
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 google.protobuf
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 ilgpu
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part1
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part2-fragment1
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part2-primary
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part3-fragment1
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part3-fragment2
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part3-fragment3
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part3-primary
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part4-fragment1
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part4-primary
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part5-fragment1
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part5-primary
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part6
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 libtorch-cuda-12.1-linux-x64-part7
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 sharpziplib
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 skiasharp
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 skiasharp.nativeassets.macos
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 skiasharp.nativeassets.win32
drwxr-sr-x 3 jovyan users 4096 Nov 14 15:07 system.memory
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 torchsharp
drwxr-sr-x 3 jovyan users 4096 Nov 18 14:57 torchsharp-cuda-linux
Here is the error message:
System.TypeInitializationException: The type initializer for 'TorchSharp.torch' threw an exception.
---> System.NotSupportedException: The libtorch-cpu-linux-x64 package version 2.1.0.1 is not restored on this system. If using F# Interactive or .NET Interactive you may need to add a reference to this package, e.g.
#r "nuget: libtorch-cpu-linux-x64, 2.1.0.1". Trace from LoadNativeBackend:
TorchSharp: LoadNativeBackend: Initialising native backend, useCudaBackend = False
Step 1 - First try regular load of native libtorch binaries.
Trying to load native component torch_cpu relative to /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/TorchSharp.dll
Failed to load native component torch_cpu relative to /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/TorchSharp.dll
Trying to load native component LibTorchSharp relative to /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/TorchSharp.dll
Failed to load native component LibTorchSharp relative to /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/TorchSharp.dll
Result from regular native load of LibTorchSharp is False
Step 3 - Alternative load from consolidated directory of native binaries from nuget packages
torchsharpLoc = /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0
packagesDir = /home/jovyan/.nuget/packages
torchsharpHome = /home/jovyan/.nuget/packages/torchsharp/0.101.2
Trying dynamic load for .NET/F# Interactive by consolidating native libtorch-cpu-linux-x64-* binaries to /home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/cpu...
Consolidating native binaries, packagesDir=/home/jovyan/.nuget/packages, packagePattern=libtorch-cpu-linux-x64, packageVersion=2.1.0.1 to target=/home/jovyan/.nuget/packages/torchsharp/0.101.2/lib/net6.0/cpu...
at TorchSharp.torch.LoadNativeBackend(Boolean useCudaBackend, StringBuilder& trace)
at TorchSharp.torch.InitializeDeviceType(DeviceType deviceType)
at TorchSharp.torch.InitializeDevice(Device device)
at TorchSharp.torch..cctor()
--- End of inner exception stack trace ---
at TorchSharp.torch.TryInitializeDeviceType(DeviceType deviceType)
at TorchSharp.torch.cuda.is_available()
at Submission#5.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)
at TorchSharp.torch.TryInitializeDeviceType(DeviceType deviceType)
at TorchSharp.torch.cuda.is_available()
at Submission#5.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)
A co-worker (@wss-rbrennan) of mine my have shed some light on this issue:
"The problem has more to do with nuget itself. TorchSharp used a clever way of putting together the libtorch-cuda-12.1-linux-x64 package because nuget has a max package size of 250mb. The work around combines multiple packages at build time in a project, so your project works, but interactive doesn't build the same way, so the reference fails."
Not sure if this is a problem per se, or just something to account for when using TorchSharp from within interactive mode or whatever?
Thank you for the follow-up, and that's sort of what I was seeing, too. But... it used to work!
The stitching together only happens the first time, i.e. when a build finds that the stitched package is not available in the NuGet cache locally.
You think there is some sort of snippet that could be run to ensure proper stitching?
On November 29, 2023, Ahmed Shirin @.***> wrote:
Thank you for the follow-up, and that's sort of what I was seeing, too. But... it used to work!
The stitching together only happens the first time, i.e. when a build finds that the stitched package is not available in the NuGet cache locally.
— Reply to this email directly, view it on GitHub https://github.com/dotnet/TorchSharp/issues/1146#issuecomment-1832416714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5ESNE2ITABRV6RJBRQVLYG5YMVAVCNFSM6AAAAAA7LRQDI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGQYTMNZRGQ . You are receiving this because you authored the thread.Message ID: @.***>
And it works on Windows, which has the same package stitching problem.
You think there is some sort of snippet that could be run to ensure proper stitching?
All I can think of is a dotnet build,
but I think you already did that and it worked, so the stitching should already have been done.
Or, maybe... clear the ~/.nuget/packages cache, as well as anything under ~/.packagemanagement/nuget. Then, build your console program again, then try the .ipynb file again. Another shot in the dark...
Okay, so after a bunch of finagling, I finally get to where you are -- no blow-up when loading the backend, but is_available()
returns false
. It works fine when I run one of the TorchExamples on CUDA, or on Windows interactively or console app.
Hi there!
This may be related to #345, so please bear with me.
I'm trying to use TorchSharp with dotnet-interactive with Jupyter notebook and I'm encountering the following behavior:
Now, I am running my setup through Docker, so I wondered if perhaps I had an issue there, so I made a quick console application to test "connectivity" with my GPU.
I'm kind of struggling to get my arms around the issue, what are some next steps I could take?
Cheers!