dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.4k stars 182 forks source link

Linux CUDA packages don't restore #293

Closed dsyme closed 3 years ago

dsyme commented 3 years ago

The Linux CUDA packages failed to restore in the DiffSharp repo build, see https://github.com/DiffSharp/DiffSharp/actions/runs/958984202

         /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.5/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018: The "FileRestitch" task failed unexpectedly. [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.5/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018: System.Exception: Error downloading and reviving packages. Reconsituted file contents have incorrect SHA [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.5/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018:     Expected SHA: $94b99584debeb3c52a5d3f173d157db81289d7741923bccbcec10d8b9ff09f0e [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
dsyme commented 3 years ago

Fixed by https://github.com/xamarin/TorchSharp/commit/06a87ada2679b4c1c3f9f898e658c5f9fd6038d3, will need to republish

dsyme commented 3 years ago

Should be fixed by these versions

    <LibTorchNugetVersion>1.9.0.6</LibTorchNugetVersion>
    <TorchSharpVersion>0.91.52672</TorchSharpVersion>
dsyme commented 3 years ago

Reopening, as the FileRestitcher has hit the maximum size of .NET Arrays, as expected sooner or later

         /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.6/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018: The "FileRestitch" task failed unexpectedly. [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.6/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018: System.OverflowException: Array dimensions exceeded supported range. [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.6/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018:    at InlineCode.FileRestitch.Execute() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.6/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.nuget/packages/libtorch-cuda-11.1-linux-x64-part2-primary/1.9.0.6/buildTransitive/netstandard2.0/libtorch-cuda-11.1-linux-x64-part2-primary.targets(204,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
dsyme commented 3 years ago

That one is fixed, now fixed in master, awaiting package build

dsyme commented 3 years ago

The TorchSharp problems are now fixed, however some part of .NET MSBuild infrastructure is unable to cope with > 2GB native binaries, in this case the GenerateDepsFile task using GetVersionInfo which tries to treat the massive libtorch binary as a managed assembly using TryLoadManagedAssemblyMetadata causing an exception at GetAndValidateSize

I'll check the source code for these. If there's no workaround it might be kind of impossible to have 1.9.0 libtorch CUDA on LINUX delivered via nuget packages.

       "/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj" (pack target) (1:7) ->
       (GenerateBuildDependencyFile target) -> 
         /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018: The "GenerateDepsFile" task failed unexpectedly. [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018: System.ArgumentException: Stream length minus starting position is too large to hold a PEImage. (Parameter 'peStream') [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at System.Reflection.Internal.StreamExtensions.GetAndValidateSize(Stream stream, Int32 size, String streamParameterName) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at System.Reflection.PortableExecutable.PEReader..ctor(Stream peStream, PEStreamOptions options, Int32 size) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at System.Diagnostics.FileVersionInfo.TryLoadManagedAssemblyMetadata() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at System.Diagnostics.FileVersionInfo.GetVersionInfo(String fileName) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.FileUtilities.GetFileVersion(String sourcePath) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.DependencyContextBuilder.CreateRuntimeFile(String path, String fullPath) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at System.Linq.Enumerable.SelectListIterator`2.ToArray() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.DependencyContextBuilder.GetRuntimeLibrary(DependencyLibrary library) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.DependencyContextBuilder.Build() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.GenerateDepsFile.WriteDepsFile(String depsFilePath) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.GenerateDepsFile.ExecuteCore() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.NET.Build.Tasks.TaskBase.Execute() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute() [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
       /home/runner/.dotnet/sdk/5.0.100/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(195,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask) [/home/runner/work/DiffSharp/DiffSharp/bundles/DiffSharp-cuda-linux/DiffSharp-cuda-linux.fsproj]
dsyme commented 3 years ago

It looks like the specific problem with TryLoadManagedAssemblyMetadata was fixed 3 months ago here: https://github.com/dotnet/runtime/pull/50237

So this fix will only be in .NET 6

dsyme commented 3 years ago

OK, cool, dotnet / MSBuild can handle > 2GB native binaries as of 6.0.100-preview.5.21302.13, I've tested that in the DiffSharp repository. I will adjust the README to indicate that using this toolchain is necessary if consuming libtorch via the LINUX CUDA package.

So we now have both 1.9.0 and Mac support done.