Open Utanapishtim31 opened 8 months ago
I've been able to create a small C# app which reproduces the problem. You will have to run it as debug from Visual Studio with "Enable native code debugging" checked in the Properties/Debug of the project.
The crash occurs when the app closes automatically after 50 predictions with an "Unhandled exception at 0x00007FF99875F61E (ucrtbase.dll) in TensorflowBufferOverflow.exe: Fatal program exit requested." in SafeEagerTensorHandle.ReleaseHandle().
You will probably have to run the app several times before this exception is raised.
Plase note that in my real application this exception is raised during the lifetime of the application, not only when closing, so this exception is a lot more critical. Furthermore the exception is raised in my app in SafeTensorHandle.ReleaseHandle() from SafeStringTensorHandle.ReleaseHandle() so it is not exactly the same error as here, but I hope that they are similar enough so that a fix can be applied to both classes.
Description
My application uses a model and regularly calls predict(). After a dozen calls to predict() (the number of calls is variable), my application crashes.
After a very long investigation, I have been able to detect that some data have been written after the end of a buffer (i.e. a buffer overflow).
Windows debugger displays the following message:
and another debug message states that data have been written after the end of a memory buffer.
The call stack after the exception tells that this memory buffer is managed by a SafeTensorHandle:
By tagging the SafeTensorHandles created during the lifetime of my application, I have been able to detect that the SafeTensorHandle causing the crash is the handle contained in a SafeStringTensorHandle. Actually, it is the handle 'safeTensorHandle' created in Tensor.StringTensor():
This SafeStringTensorHandle is created during model.predict() and contains the optimization options. The call stack where it is created is like this:
This bug is very serious because it precludes the deployment of my application to my customers.
Reproduction Steps
I have not been able to create a minimal application to reproduce the bug, primarily because it occurs randomly when the GC decides to delete the handles.
Known Workarounds
No workaround found.
Configuration and Other Information
Tensorflow.NET 0.110.4 Tensorflow.Keras 0.11.4 Windows 11