dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.36k stars 177 forks source link

System.AccessViolationException Error when using TorchSharp #1292

Open rapid-18 opened 5 months ago

rapid-18 commented 5 months ago

A System.AccessViolationException Error occurs when executing a simple code model = jit.load(modelPath, DeviceType.CUDA); how can I fit that the torchsharp version is TorchSharp-cuda-windows 0.100.7 the details of error information: System.AccessViolationException HResult=0x80004003 Message=尝试读取或写入受保护的内存。这通常指示其他内存已损坏。 Source= StackTrace:

yueyinqiu commented 5 months ago

It could work for me (the exception is not thrown by jit.load):

image

Could you please provide us more details, like the specific model file or something else?

By the way, could the problem be solved by updating the package? I'm using 0.102.4.

rapid-18 commented 5 months ago

I could hardly provide any more details because it's just a simple pytorch trained model, and this problem also occurs in many other executions related to loading model, like below nn.Module Model = torchvision.models.resnet50(); so I think its not the problem from the model But thank you anyway and I wil try on a newer version

NiklasGustafsson commented 5 months ago

Yes, please start by upgrading to the most recent version of TorchSharp -- it's very hard (let's call it impossible), given limited resources, for us to troubleshoot a version as old as 0.100.7, which is based on an earlier version of libtorch.

rapid-18 commented 5 months ago

This problem still exists even when I change to version 0.102.4 when loading models.......

NiklasGustafsson commented 5 months ago

Okay, that's good to know, that makes it easier to troubleshoot.

Can you please show the Python code used to generate the "exported.method.dat" file?

yueyinqiu commented 5 months ago

@NiklasGustafsson The exported.method.dat is created by me, not by @rapid-18 .

And it could work for me. The exception is just because I didn't pass the required parameter. It's not the AccessViolationException we are talking about. Sorry for the misleading screenshot.

travisjj commented 1 month ago

This type of exception appears to be caused by the user's system running out of memory.

As the code being executed is using pointers, the OOM exception is sort of vaguely just represented as a memory access violation. I have found expanding the virtual memory can alleviate this (for users with SSD's), however, if your model is excessively large, then you may consider trialing a small version to verify it runs and then porting it to a cloud GPU.

The memory issue can be verified by slowly expanding the system memory, and observing that the AccessViolation will occur in different places in code.

The core issue of using too much memory could happen from a wide variety of issues.