Giorgi / DuckDB.NET

Bindings and ADO.NET Provider for DuckDB
https://duckdb.net
MIT License
398 stars 67 forks source link

Heap Corruption (0xc0000374) on Appender.Close() with Large Batch Sizes (>1000 rows) #218

Open horizonchasers opened 1 week ago

horizonchasers commented 1 week ago

When attempting to insert a large number of records into a DuckDB database using DuckDB.NET, the application crashes with a heap corruption error (exit code 0xc0000374) if the batch size is too large. The crash occurs during the appender.Close() operation.

Code to recreate the issue and an executable

https://github.com/horizonchasers/DuckDBCrash

Code Overview

The main logic is in the SimplifiedFileAnalyzer class. It performs the following steps:

Initializes a DuckDB database Loads sample data from a JSON file Inserts the data into the database using an appender The crash occurs during the data insertion phase when the batch size is too large.

Observed Behavior

With a batch size of 1000 or less, the insertion completes successfully. With a batch size of 1500 or more, the application crashes with a heap corruption error.

This has been test on multiple machines. All machines were Windows 11 and .NET 8

horizonchasers commented 1 week ago
A breakpoint instruction (__debugbreak() statement or a similar call) was executed in DuckDB.NET.MemoryExceptionCode.exe.

DuckDB.NET.Bindings.dll!DuckDB.NET.Native.DuckDBDataChunk.ReleaseHandle() Line 87   C#

ntdll.dll!RtlIsZeroMemory() + 162 bytes
ntdll.dll!__misaligned_access() + 1066 bytes
ntdll.dll!__misaligned_access() + 1802 bytes
ntdll.dll!00007ffd99851de5()
ntdll.dll!RtlGetCurrentServiceSessionId() + 4892 bytes
ntdll.dll!RtlFreeHeap() + 81 bytes
ucrtbase.dll!_free_base() + 27 bytes
duckdb.dll!duckdb::AllocatedData::~AllocatedData() + 52 bytes
duckdb.dll!duckdb::VectorCache::VectorCache() + 2566 bytes
duckdb.dll!00007ffc6a850246()
duckdb.dll!00007ffc6ab764cf()
duckdb.dll!duckdb_destroy_data_chunk() + 34 bytes
[Managed to Native Transition]
DuckDB.NET.Bindings.dll!DuckDB.NET.Native.DuckDBDataChunk.ReleaseHandle() Line 87
    at F:\src\DuckDB.NET\DuckDB.NET.Bindings\DuckDBWrapperObjects.cs(87)
System.Private.CoreLib.dll!System.Runtime.InteropServices.SafeHandle.InternalRelease(bool disposeOrFinalizeOperation)
System.Private.CoreLib.dll!System.Runtime.InteropServices.SafeHandle.Dispose()
DuckDB.NET.Data.dll!DuckDB.NET.Data.DuckDBAppender.Close() Line 86
    at F:\src\DuckDB.NET\DuckDB.NET.Data\DuckDBAppender.cs(86)
DuckDB.NET.MemoryExceptionCode.dll!SimplifiedFileAnalyzer.BulkInsertScannedFilesAsync.AnonymousMethod__0() Line 173
    at F:\src\DuckDB.NET\MemoryExceptionCode\Program.cs(173)
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread threadPoolThread, System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot, System.Threading.Thread threadPoolThread)
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[Native to Managed Transition]
kernel32.dll!BaseThreadInitThunk() + 29 bytes
ntdll.dll!RtlUserThreadStart() + 40 bytes
horizonchasers commented 1 week ago

This issue is above my pay grade but confirmed it is in ReleaseHandle() I think it is when it exceeds a certain size that is reached with batch size > ~1000 for my payload. The issue existed in 1.0.2 as well.

protected override bool ReleaseHandle()
{
    try
    {
        Console.WriteLine($"Releasing DuckDBDataChunk handle: {handle}"); // Debug output
        if (handle != IntPtr.Zero)
        {
            IntPtr handleCopy = handle;
            NativeMethods.DataChunks.DuckDBDestroyDataChunk(out handleCopy);
            handle = IntPtr.Zero;
            return true;
        }
        return false;
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Exception in ReleaseHandle: {ex}"); // Log the exception
        return false;
    }
}

Failed run- Using database: F:\src\DuckDB.NET\MemoryExceptionCode\database_20240914190656.db Initializing DuckDB... Connection string: Data Source=F:\src\DuckDB.NET\MemoryExceptionCode\database_20240914190656.db DuckDB initialized successfully. Loading data from JSON file: F:\src\DuckDB.NET\MemoryExceptionCode\scanned_files_dump.json Loaded 8841 files from JSON. Starting bulk insert... Inserting 8 unique file extensions... Releasing DuckDBDataChunk handle: 1623535679856 Releasing DuckDBDataChunk handle: 1623540721504 Releasing DuckDBDataChunk handle: 1623535682656 Releasing DuckDBDataChunk handle: 1623540721504 Releasing DuckDBDataChunk handle: 1623540722384 Releasing DuckDBDataChunk handle: 1623535679856 Releasing DuckDBDataChunk handle: 1623540721504 Releasing DuckDBDataChunk handle: 1623535682656 File extensions inserted successfully. Inserting file info in 6 batches... Processing batch 1/6... Releasing DuckDBDataChunk handle: 1623535683456

Fails here - NativeMethods.DataChunks.DuckDBDestroyDataChunk(out handleCopy)

Successful run- Using database: F:\src\DuckDB.NET\MemoryExceptionCode\database_20240914190312.db Initializing DuckDB... Connection string: Data Source=F:\src\DuckDB.NET\MemoryExceptionCode\database_20240914190312.db DuckDB initialized successfully. Loading data from JSON file: F:\src\DuckDB.NET\MemoryExceptionCode\scanned_files_dump.json Loaded 8841 files from JSON. Starting bulk insert... Inserting 8 unique file extensions... Releasing DuckDBDataChunk handle: 2843311654544 Releasing DuckDBDataChunk handle: 2843311655184 Releasing DuckDBDataChunk handle: 2843311654544 Releasing DuckDBDataChunk handle: 2843311655584 Releasing DuckDBDataChunk handle: 2843311655184 Releasing DuckDBDataChunk handle: 2843311650544 Releasing DuckDBDataChunk handle: 2843311650544 Releasing DuckDBDataChunk handle: 2843311654224 File extensions inserted successfully. Inserting file info in 9 batches... Processing batch 1/9... Releasing DuckDBDataChunk handle: 2843311654224 Batch 1/9 processed successfully. Processing batch 2/9... Releasing DuckDBDataChunk handle: 2843311653424 Batch 2/9 processed successfully. Processing batch 3/9... Releasing DuckDBDataChunk handle: 2843311652784 Batch 3/9 processed successfully. Processing batch 4/9... Releasing DuckDBDataChunk handle: 2843311651584 Batch 4/9 processed successfully. Processing batch 5/9... Releasing DuckDBDataChunk handle: 2843311654224 Batch 5/9 processed successfully. Processing batch 6/9... Releasing DuckDBDataChunk handle: 2843311651424 Batch 6/9 processed successfully. Processing batch 7/9... Releasing DuckDBDataChunk handle: 2843311654224 Batch 7/9 processed successfully. Processing batch 8/9... Releasing DuckDBDataChunk handle: 2843311652784 Batch 8/9 processed successfully. Processing batch 9/9... Releasing DuckDBDataChunk handle: 2843311653424 Batch 9/9 processed successfully. Bulk insert completed successfully. Data insertion completed successfully. Application Execution Complete.

F:\src\DuckDB.NET\MemoryExceptionCode\bin\Debug\net8.0\DuckDB.NET.MemoryExceptionCode.exe (process 30820) exited with code 0 (0x0). To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops. Press any key to close this window . . .

Giorgi commented 6 days ago

Inserting 1500 rows shouldn't be a problem. This test inserts 5000 rows in one go. In fact, the appender will chunk the data internally in batches of 2048 rows (that is the vector size in DuckDB) and append the data chunk once it fills 2048 rows, so 1500 rows is just one data chunk. The only thing I can suggest at the moment is to compile DuckDB in Debug mode and use that Debug dll, it might throw an assert failure with more details.