ericsink / SQLitePCL.raw

A Portable Class Library (PCL) for low-level (raw) access to SQLite
Apache License 2.0
512 stars 106 forks source link

Unexpected System.AccessViolationException when using Microsoft.Data.Sqlite #430

Closed EYHN closed 2 years ago

EYHN commented 2 years ago

Hi, I'm using this library in my personal project and when executing a query, there is a probability that a System.AccessViolationException will occur.

I am sorry that I cannot find the cause of the problem. I came here for some help

The error log is as follows:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at Microsoft.Data.Sqlite.SqliteCommand+<PrepareAndEnumerateStatements>d__64.MoveNext()
   at Microsoft.Data.Sqlite.SqliteCommand+<GetStatements>d__54.MoveNext()
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
   at ConsoleApp1.Program.Main(System.String[])

I have stable reproduction code as follows: (Usually runs will crash within 30 seconds)

using System;
using System.Data;
using Microsoft.Data.Sqlite;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var connection =
                new SqliteConnection(
                    @"Data Source=C:\Users\eyhn\Projects\Anything\core\Anything\bin\Debug\net5.0\cache\tracker.db"))
            {
                connection.Open();

                while (true)
                {
                    using (connection.BeginTransaction())
                    {
                        var queryCommand = connection.CreateCommand();
                        queryCommand.CommandText = $@"
                        SELECT Id, Url, Parent, IsDirectory, IdentifierTag, ContentTag FROM FileTracker WHERE Url=?1;
                        ";
                        queryCommand.Parameters.Add(new SqliteParameter("?1", "file://local/"));

                        var reader = queryCommand.ExecuteReader();

                        var result = HandleReaderSingleDataRow(reader);

                        Console.WriteLine(result?.Id + "");
                    }
                }
            }
        }

        private static DataRow HandleReaderSingleDataRow(IDataReader reader)
        {
            if (!reader.Read())
            {
                return null;
            }

            return new DataRow(
                reader.GetInt64(0),
                reader.GetString(1),
                reader.GetValue(2) as long?,
                reader.GetBoolean(3),
                reader.GetValue(4) as string,
                reader.GetValue(5) as string);
        }

        public record DataRow(long Id, string Url, long? Parent, bool IsDirectory, string IdentifierTag,
            string ContentTag);
    }
}

The package version i used:

> dotnet list package --include-transitive
Project 'ConsoleApp1' has the following package references
   [net5.0]: 
   Top-level Package            Requested                 Resolved
   > Microsoft.Data.Sqlite      6.0.0-preview.6.21352.1   6.0.0-preview.6.21352.1

   Transitive Package                         Resolved
   > Microsoft.Data.Sqlite.Core               6.0.0-preview.6.21352.1
   > SQLitePCLRaw.bundle_e_sqlite3            2.0.5-pre20210521085756
   > SQLitePCLRaw.core                        2.0.5-pre20210521085756
   > SQLitePCLRaw.lib.e_sqlite3               2.0.5-pre20210521085756
   > SQLitePCLRaw.provider.dynamic_cdecl      2.0.5-pre20210521085756
   > System.Memory                            4.5.3

After my attempts, the same problem appeared on windows pc and macbook.

I'll upload the database file I'm using, and this file seems to be the only one that's having problems.

database.zip

bricelam commented 2 years ago

Hmm... using the span overload of sqlite3_prepare_v2 is new in Microsoft.Data.Sqlite 6.0.0.

ericsink commented 2 years ago

I'll try to dig into this tomorrow.

A couple months ago, based on discussion from somebody dealing with a crash, I came to suspect, but never confirmed, that there may be cases where I construct a Span that might outlive the pointer it came from.

ericsink commented 2 years ago

where I construct a Span that might outlive the pointer it came from

... but this issue does not appear to be what I was thinking about when I wrote that.

ericsink commented 2 years ago

The apparent code involved from provider.tt is in this function:

unsafe int ISQLite3Provider.sqlite3_prepare_v2(sqlite3 db, ReadOnlySpan<byte> sql, out IntPtr stm, out ReadOnlySpan<byte> tail)

in cases where the underlying C function didn't consume the entire string, the tail returns a Span made from a slice of the Span that was given:

                if (len_remain > 0)
                {
                    tail = sql.Slice(len_consumed, len_remain);
                }

and all that happens inside the fixed block, so the pointers should still be valid, and no pointers escape out to the caller.

The call from Microsoft.Data.Sqlite appears to be in SqliteCommand.cs. The source of the string is:

            var sql = new byte[byteCount + 1];
            Encoding.UTF8.GetBytes(_commandText, 0, _commandText.Length, sql, 0);

and the call itself:

                ReadOnlySpan<byte> tail;
                while (IsBusy(rc = sqlite3_prepare_v2(_connection!.Handle, sql.AsSpan(start), out stmt, out tail)))

which I'm guessing is safe. I mean, tail is going to end up as a slice of the span returned by sql.AsSpan(), which itself is not held in a local, but that should be okay?

And the sql.AsSpan() does start from start, which reflects whatever the tail gave, so...

So at the moment I don't see anything obvious, which makes me suspect that either (1) I am missing something obvious, or (2) an off-by-one error somewhere, which is the next thing I'll look at.

bricelam commented 2 years ago

I'm seeing this error a lot more consistently in my pooling branch (https://github.com/dotnet/efcore/pull/25018). Like you, I don't see anything obviously wrong with the Span code...

ericsink commented 2 years ago

No real progress yet, but here are some findings:

The test code posted by @EYHN does reproduce the problem for me on Windows.

I don't see any significant off-by-one errors that I mentioned in a previous comment.

In general, the span tends to arrive with an extra byte, the zero terminator, which we don't really need, but that seems to cause no problems. I tried lying to the native code by passing Length -1 (thus removing that extra byte) and it made no difference.

The test case string has a lot of whitespace before and after, but removing it seems to make no difference. Something in M.D.Sqlite is probably doing a Trim, since sqlite3_prepare_v2 doesn't actually get called a second time with all that whitespace.

Whether the function returns a tail span or not seems to make no difference. I hacked the code to always return ReadOnlySpan.Empty for the tail, and the crash still happens.

And also, all of the above was tested against a current build of the tree, not against 2.0.4. One major difference here is that the current tree uses DllImport for .NET Core instead of the dynamic provider. So that difference is yet another thing that makes no difference.

ericsink commented 2 years ago

The "other" overload (the one that uses utf8z instead of ReadOnlySpan) of sqlite3_prepare_v2 passes -1 for the length, which means the native code will determine the length by looking for the zero terminator.

Hacking the span overload to pass -1 (ignoring the length given in the span, since the caller seems to always be providing a zero terminator anyway) also seems to have no effect.

ericsink commented 2 years ago

I ported the repro program to use SQLitePCLRaw.ugly instead of Microsoft.Data.Sqlite, and it no longer crashes. I think my version is equivalent. The code is in test_nupkgs/bug430.

bricelam commented 2 years ago

@ajcvickers is there anyone we can pull in from the runtime team to help investigate?

ajcvickers commented 2 years ago

@pilchie Any ideas on who might be able to help here?

Pilchie commented 2 years ago

@stephentoub? Any ideas?

bricelam commented 2 years ago

Here's the PR where the change went in: https://github.com/dotnet/efcore/pull/24331

Are the while and do..while loops an issue? Or the yield return?

ericsink commented 2 years ago

@bricelam I did glare at those loops with a measure of suspicion, but I couldn't convince myself they were problematic.

Still might be interesting to move the sql.AsSpan(start) outside the while(IsBusy()) and see if the problem still repros. AFAICT, the AsSpan doesn't need to be inside the loop anyway?

bricelam commented 2 years ago

Oh wow, we're still seeing the exception after switching back to the string overload. I think Span was a red herring.

bricelam commented 2 years ago

Maybe a pinning issue in SQLitePCLRaw?

stephentoub commented 2 years ago

Sorry, I missed yesterday that I was tagged. From the most recent comments it sounds like this isn't actually due to the cited PR but is actually longer standing and not related to the new span usage?

ericsink commented 2 years ago

@bricelam Hmmm. That's interesting. And it gives me a new angle of investigation.

ericsink commented 2 years ago

Thinking out loud:

Okay, let's say Span was red herring. It was still a good guess, because of the timing, and because of the recent changed in Microsoft.Data.Sqlite to use the Span overload.

If not Span, then did something else change?

How long has this bug been around?

And if the answer is "a very long time", why are we just finding it now?

I wish we had a repro that does not involve Microsoft.Data.Sqlite. But due to the nature of the problem, it still seems likely that the bug, whatever it is, is somewhere in SQLitePCLRaw.

ericsink commented 2 years ago

How confident are we that the error seen WITH the Span overload is the same one seen after "switching back to the string overload" ?

ericsink commented 2 years ago

Clarification:

There are 6 overloads of sqlite3_prepare_v2 in raw.cs, 3 that return a tail, and 3 that do not. We're talking about the ones that return a tail.

Those 3 overloads are for string, ReadOnlySpan<byte>, and utf8z.

Previously, Microsoft.Data.Sqlite was using the string overload. Then it changed to use the Span overload.

The string overload calls the Span overload. It does not call the utf8z overload.

So AFAICT, despite something I said above, it has not been using the utf8z overload. And that's good, because the more review that code, the less I trust it.

ericsink commented 2 years ago

The search for this bug should probably include taking a fresh look at #409 and #321

bricelam commented 2 years ago

How confident are we that the error seen WITH the Span overload is the same one seen after "switching back to the string overload" ?

The stack trace is exactly the same going into sqlite3_prepare_v2 where the exception is thrown. Both throw AccessViolationException.

bricelam commented 2 years ago
AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at SQLitePCL.raw.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, SQLitePCL.sqlite3_stmt ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at SQLitePCL.raw.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.String, SQLitePCL.sqlite3_stmt ByRef, System.String ByRef)
ericsink commented 2 years ago

So yeah, those stack traces look the same. Both code paths end up in the Span overload anyway.

FWIW, I wanted to completely rule out the suspected problems with the utf8z overload, so I have a local build where I completely removed it. Running that with Microsoft.Data.Sqlite repro, I get no MissingMethod exceptions or anything like that, so it doesn't seem to be getting accidentally used anywhere.

Clearly, the sqlite3_prepare_v2 span overload is where the symptom happens, but I'm not sure the actual memory corruption is happening there.

bricelam commented 2 years ago

@stephentoub Can we start blaming the JIT or GC? 😏

ericsink commented 2 years ago

"Can we start blaming the JIT or GC?"

LOL.

More thinking out loud:

I still can't get the crash to happen in my repo program without Microsoft.Data.Sqlite. I have made a number of changes to make that repro similar to the way Microsoft.Data.Sqlite is doing things, but no luck.

Microsoft.Data.Sqlite is doing something different, and that something either is the bug or it is triggering the bug. But we don't know what that something is.

We do know that the changes in your pooling branch make the bug happen more often.

Stab in the dark: Is there any way threading could be an issue here? My bug430_ugly repro is definitely not creating any threads. Is that same thing true for the original repro program from the OP?

bricelam commented 2 years ago

Playing with the repro... Here's the most minimal I can make it with M.D.Sqlite. I'll try using SQLitePCLRaw directly now.

using Microsoft.Data.Sqlite;

using var connection = new SqliteConnection("Data Source=:memory:");
connection.Open();

while (true)
{
    // NB: No using
    var command = connection.CreateCommand();
    command.CommandText = "SELECT 1";

    command.ExecuteScalar();
}
ericsink commented 2 years ago

I confirm that your minimal M.D.Sqlite repro works (throws) for me as well. Wow.

bricelam commented 2 years ago

Could it be SQLITE_OPEN_NOMUTEX? We started passing that in 6.0

bricelam commented 2 years ago

Mystery solved. Adding SQLITE_OPEN_NOMUTEX to 5.0 makes it throw. Does SQLitePCLRaw use threads?

ericsink commented 2 years ago

(1) Wait, I thought you had previously said the minimal repro did throw on 5.0 ?

(2) Well, I don't think there are any threads anywhere. I'll double-check.

(3) Adding SQLITE_OPEN_NOMUTEX to my non-M.D.Sqlite repro does not seem to make it throw.

bricelam commented 2 years ago

I thought 5.0 was throwing, but when I did a full rebuild, it started working.

I suspect the concurrent access is happening on some GC thread--that's why we see it more without a using.

But maybe it has nothing to do with concurrent access and there's just a lifetime bug somewhere inside SQLite itself when SQLITE_OPEN_NOMUTEX is specified.

ericsink commented 2 years ago

Small/partial confirmation: I built a SQLitePCLRaw which strips out SQLITE_OPEN_NOMUTEX even if the caller specifies it. The minimal M.D.Sqlite repro no longer throws.

I'm using 6.0.0-preview.6.21352.1

ericsink commented 2 years ago

"Does SQLitePCLRaw use threads?"

No.

I was pretty sure, as the idea that I would spin up a thread in SQLitePCLRaw is kind of unthinkable, but this kind of investigation is the time to be 100% sure. I looked everywhere I could think of, and I never create a thread, nor (AFAICT) do I call anything that would.

ericsink commented 2 years ago

So is removing SQLITE_OPEN_NOMUTEX a viable fix for you folks?

bricelam commented 2 years ago

So is removing SQLITE_OPEN_NOMUTEX a viable fix for you folks?

Yep. It was really just added on a whim to see if we could get any perf gains from it.

ajcvickers commented 2 years ago

Thanks for all your help on this, @ericsink!

kpreisser commented 2 years ago

Hi,

I just randomly stumbled across this issue. I think it hasn't been noted yet, but to me it seems clear why exceptions like AccessViolationException can happen using the repro code in https://github.com/ericsink/SQLitePCL.raw/issues/430#issuecomment-899733647, when SQLITE_OPEN_NOMUTEX is specified for the connection (which is not a bug in SQLite or the GC/CLR):

Notice that the code creates a new SqliteCommand command in a loop without a using statement (or calling .Dispose() on it). This means that some time after the SqliteCommand is no longer referenced, the GC will eventually call the finalizer (in a separate thread) of this object, which calls Dispose(false) (since it hasn't been disposed yet).

SqliteCommand.Dispose(bool) looks like this:

        protected override void Dispose(bool disposing)
        {
            DisposePreparedStatements(disposing);
            // ...

Note it calls DisposePreparedStatements, passing the disposing parameter. Looking at this method:


        private void DisposePreparedStatements(bool disposing = true)
        {
            if (disposing
                && DataReader != null)
            {
                DataReader.Dispose();
                DataReader = null;
            }

            if (_preparedStatements != null)
            {
                foreach (var stmt in _preparedStatements)
                {
                    stmt.Dispose();
                }

                _preparedStatements.Clear();
            }

            _prepared = false;
        }

We can see that in the second if block, it iterates through the _preparedStatements and calls .Dispose() on them, regardless of whether disposing is true or false. (AFAIK this is not a correct implementation of the Dispose pattern, because when the finalizer is executed, object references on the current object might no longer be valid, and therefore the _preparedStatements shouldn't be accessed when disposing is false).

sqlite3_stmt.Dispose() in turn calls its ReleaseHandle(), which calls the SQLite native function sqlite3_finalize() in order to finalize the prepared statement.

Therefore, it can happen that the finalizer thread calls the native sqlite3_finalize() function for a specific native DB connection at the same time when the main thread calls other SQLite functions like sqlite3_prepare_v2() (within CreateCommand()/ExecuteScalar()) for the same native DB connection, which is not allowed when using SQLITE_OPEN_NOMUTEX, because then each DB connection may only be accessed by a single thread at the same time.

(Note: Even if the SqliteCommand.DisposePreparedStatements() is changed to only iterate over _preparedStatements when disposing is true, the access violation will probably still occur because then the GC will eventually finalize the sqlite3_stmt separately.)

Thanks!

jhgbrt commented 2 years ago

I'm still seeing this issue with microsoft.data.sqlite.core v 6.0.0-preview.7.21378.4 and sqlitepclraw.bundle_e_sqlite3 v2.0.6...

var connection = SqliteFactory.Instance.CreateConnection();
connection.ConnectionString = "Data Source=:memory:";
connection.Open();

while (true)
{
    var command = connection.CreateCommand();
    command.CommandText = "SELECT 1";
    command.CommandType = System.Data.CommandType.Text;
    command.ExecuteNonQuery();
}

this gives:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at SQLitePCL.SQLite3Provider_e_sqlite3+NativeMethods.sqlite3_prepare_v2(SQLitePCL.sqlite3, Byte*, Int32, IntPtr ByRef, Byte* ByRef)
--------------------------------
   at SQLitePCL.SQLite3Provider_e_sqlite3.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at Microsoft.Data.Sqlite.SqliteCommand+<PrepareAndEnumerateStatements>d__64.MoveNext()
   at Microsoft.Data.Sqlite.SqliteCommand+<GetStatements>d__54.MoveNext()
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()
   at <Program>$.<Main>$(System.String[])
ericsink commented 2 years ago

@jhgbrt I can repro this crash using the versions you specified.

But I just tried it with RC 1 (released today), and the crash did not happen.

jhgbrt commented 2 years ago

I can confirm it's fixed with rc1 of Microsoft.Data.SqLite.Core

bricelam commented 2 years ago

Yes, this was deliberately fixed on our side for RC1.