dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.4k stars 182 forks source link

torch.nn.Sequential throws exception from test project #891

Open xhuan8 opened 1 year ago

xhuan8 commented 1 year ago

With a simple call to Sequential

var a = torch.nn.Sequential();

from unit test it can work, but when call from test project it throws exception.

System.OverflowException: Arithmetic operation resulted in an overflow.at System.lntPtr.op Explicit(lntPtr value)at TorchSharp.PinnedArray 1.CreateArray(lntPtr length)at
TorchSharp.PInvoke.LibTorchSharp.THSNN Module get named parameters(HType module, AllocatePinnedArray allocator1, AllocatePinnedArray allocator2)at TorchSharp.torch.nn.Module. named parameters()at TorchSharp.torch.nn.Module..ctor(lntPtr handle, Nullable'1 boxedHandle,Boolean ownsHandle)
at TorchSharp.torch.nn.Module2..ctor(lntPtr handle, IntPtr boxedHandle)at TorchSharp.Modules.Sequential..ctor(lntPtr handle)
at TorchSharp.torch.nn.Sequential0)
at TorchSharpTest.MainWindow.button3 Click(Object senderRoutedEventArgs e) in
D:\Code\TorchSharpTest TorchSharpTest TorchSharpTest MainWindow.xaml.cs:line 157

try to accesse the variable still got exception

void THSNN_Module_get_named_parameters(const NNModule module, Tensor* (*allocator1)(size_t length), const char** (*allocator2)(size_t length))
{
    auto parameters = (*module)->named_parameters();
    auto size = parameters.size(); // try it here
    Tensor* result1 = allocator1(parameters.size());
    const char** result2 = allocator2(parameters.size());

    for (size_t i = 0; i < parameters.size(); i++)
    {
        result1[i] = ResultTensor(parameters[i].value());
        result2[i] = make_sharable_string(parameters[i].key());
    }
}
System.AccessViolationException: Attempted to read or write protectedmemory. This is often an indication that other memory is corrupt
at
[orchSharp.PInvoke.LibTorchSharp.THSNN Module get named parameters(HType module, AllocatePinnedArray allocator1, AllocatePinnedArray allocator2)at TorchSharp.torch.nn.Module. named parameters()at TorchSharp.torch.nn.Module..ctor(lntPtr handle, Nullable'1 boxedHandle,Boolean ownsHandle)
at TorchSharp.torch.nn.Module 2..ctor(lntPtr handle, IntPtr boxedHandle)at TorchSharp.Modules.Sequential..ctor(lntPtr handle)
at TorchSharp.torch.nn.Sequential0
at TorchSharpTest.MainWindow.button3 Click(Object sender,RoutedEventArgs e) in
D:\Code\TorchSharpTest)TorchSharpTest TorchSharpTest\MainWindow.xaml.cs:line 157
NiklasGustafsson commented 1 year ago

Could you post a little bit more of your code? I'd like to reproduce it, but what does the code look like that causes the error?

xhuan8 commented 1 year ago

thanks, there is only one line of code in a WPF project

private void button3_Click(object sender, RoutedEventArgs e)
        {
            try
            {
                var resNet = torch.nn.Sequential();
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.ToString());
            }
        }
NiklasGustafsson commented 1 year ago

So, nothing happens in a console app, presumably? I wonder if things work better if there's some call to TorchSharp on the main thread when the app starts -- some threading issue... If there's a first call on the main thread, maybe that will initialize things.

I'm really grasping at straws. I'll have to look into this later this week.

xhuan8 commented 1 year ago

ok, it's strange, also occurs in a console application

NiklasGustafsson commented 1 year ago

I wonder if it's because the Sequential is empty.

xhuan8 commented 1 year ago

I tried version 99.2 it's gone, maybe because my local code is corrupt.

xhuan8 commented 1 year ago

It happens on latest code when build from source, all tests passed, but Sequential still throws excpetion.

even with some submodules as parameter

var resNet = torch.nn.Sequential(("lin1", nn.Linear(1000, 100, false)));

but version 99.2 from nuget can work, so maybe the packing process has some problem.

xhuan8 commented 1 year ago

THSNN_Module_get_named_parameters failed to retriver parameters from module, seems the pointer point to some invalid address

System.OverflowException: Arithmetic operation resulted in an overflow.at System.lntPtr.op Explicit(lntPtr value)at TorchSharp.PinnedArray 1.CreateArray(lntPtr length)at
TorchSharp.PInvoke.LibTorchSharp.THSNN Module get named parameters(HType module, AllocatePinnedArray allocator1, AllocatePinnedArray allocator2)at TorchSharp.torch.nn.Module. named parameters()at TorchSharp.torch.nn.Module..ctor(lntPtr handle, Nullable'1 boxedHandle,
Boolean ownsHandle)at TorchSharp.torch.nn.Module2..ctor(IntPtr handle, IntPtr boxedHandle)at TorchSharp.ModulesLinear..ctor(lntPtr handle, IntPtr boxedHandle)at TorchSharp.torch.nn.Linear(lnt64 inputSize, Int64 outputSize, BooleanhasBias, Device device, Nullable'1 dtype)at TorchSharpTest.MainWindow.button3 Click(Object sender,RoutedEventArgs e) inD:\Code TorchSharpTest TorchSharpTest TorchSharpTest MainWindow.xaml.cs
NiklasGustafsson commented 1 year ago

@xhuan8 -- is this still happening?

xhuan8 commented 1 year ago

haven't tested on the latest version.