ericsink / SQLitePCL.raw

A Portable Class Library (PCL) for low-level (raw) access to SQLite
Apache License 2.0
512 stars 106 forks source link

Memory corruption when doing DB operations #409

Closed juanpfry closed 3 years ago

juanpfry commented 3 years ago

Hello!, We are trying to narrow out a memory corruption: "AccessViolationException" or "ExecutionEngineExceptions" in other computers.

We believe the problem is similar to the following: https://github.com/ericsink/SQLitePCL.raw/issues/321

We first suspected of a threading issue while we were stress testing the issue to find a repro, but even with a global mutex the problem occurred, so after reading issue #321 we tried stopping the GC while in DB operations and that brought down the problem from always crashing during stress to never crashing again.

This is one of the crashes we get image

... but we got others as well ... where the stack is busted, so hard to say.

Which other information can help to track down this problem? We are not familiar with the code so not sure if we can pinpoint the problem directly.

ericsink commented 3 years ago

What version are you using? If this is #321, it was fixed in 2.0.4.

juanpfry commented 3 years ago

Hi Eric, we are getting this error in 2.0.4

juanpfry commented 3 years ago

321 merge seems to be mentioning that it fixes the particular instance, but that it would be possible of other situations where a dangling pointer (or an uncounted reference to an object) make it out of a protected scope leading to the garbage collector to cause mayhem.

Given that the repro seems to be strongly tied to GC and the crashes materialize when trying to string a utf8 pointer, we have strong reasons to suspect that there might be other instances of the #321 bug. Unfortunately we are not smart enough to figure out in the codebase where it can be.

juanpfry commented 3 years ago

It is crashing here: https://github.com/ericsink/SQLitePCL.raw/blob/fcddb4b28f0c9981f0617d4f9ae7e8acb58c0b82/src/SQLitePCLRaw.core/utf8z.cs#L172

Which indicates that the content of the sp member variable is dangling.

Looking on how this read only structure is built I'm unsure how the memory is protected, in multiple places:

How is the life range of the memory protected in this function? https://github.com/ericsink/SQLitePCL.raw/blob/fcddb4b28f0c9981f0617d4f9ae7e8acb58c0b82/src/SQLitePCLRaw.core/utf8z.cs#L149-L159

ericsink commented 3 years ago

I am looking more closely at this now. I haven't worked on this issue in a while, so the details need to be swapped back into my brain's RAM. :-)

ericsink commented 3 years ago

"How is the life range of the memory protected in this function?"

Apparently it is not. :-(

There are fixes I need to make here. I seem to have gotten too fanatical about not copying strings. Thanks for the investigative work. I'll followup here when I make the fixes.

juanpfry commented 3 years ago

Cool, we are trying to build and debug to narrow down the problem. We are sorry for not being of more help :(.

juanpfry commented 3 years ago

We were able to build the package into our application.

We replaced many of the Span creations per copies, but still not sure why we get this sort of errors. Not sure if calling the GC.Collect triggers, exposes or is the cause of the crash ...

image

ericsink commented 3 years ago

That's a very strange error. Tell me again which version of .NET are you on?

juanpfry commented 3 years ago

We are on 5.0.2

By making the code to copy the strings where it comes from the native code the crashes are reduced a lot, but the app still crashes after an hour or so (we might be missing some case somewhere?).

sqlite strings live range seems to be tied to the next Next() function call, I wonder if some enumeration build is going in parallel or something like that is affecting the memory.

Also I have read around that if you call GC.Collect too often it can triggers the exception, so not sure if some of the crashes we experience when stressing the app may not be the same crashes when we do not use the GC.Collect to accelerate the crash.

Unfortunately, when we see the crash the stack is already corrupted and not sure it happened even close to where the crash is pointed out.

ericsink commented 3 years ago

One confusing thing here is that the docs say that exception is obsolete and no longer used:

https://docs.microsoft.com/en-us/dotnet/api/system.executionengineexception?view=net-5.0

juanpfry commented 3 years ago

We are puzzled as well.

juanpfry commented 3 years ago

We now believe that problem was not related to this code, SQLitePCL was just a victim of a memory corruption at a lower level in Dot Net 5. Sorry for the misreporting. https://github.com/dotnet/runtime/issues/51350