Giorgi / DuckDB.NET

Bindings and ADO.NET Provider for DuckDB
https://duckdb.net
MIT License
338 stars 61 forks source link

Error in large appender add #160

Closed Cricle closed 3 months ago

Cricle commented 7 months ago

At that time I want sync mysql to duckdb, I use appender to insert ~ 50w rows.

I after checked the data consistence, some TIMESTAMP value is error.

Like that

image

But I try insert or update this row, It will be ok.

INSERT INTO "juhe1" (_id, datetime, ja1, ja2, sa3, ca4)
VALUES (10022702, '2023-07-06 18:59:03', '2010-07-30 15:07:32', 11, '327.84', 'gNcTiI06wP')
ON CONFLICT (_id)
DO UPDATE SET datetime=EXCLUDED.datetime, ja1=EXCLUDED.ja1, ja2=EXCLUDED.ja2, sa3=EXCLUDED.sa3, ca4=EXCLUDED.ca4;

The append code is

public static Action<DuckDBAppenderRow, IDataReader>[] BuildFetcher(IDataRecord record)
{
    var fetcher = new Action<DuckDBAppenderRow, IDataReader>[record.FieldCount];
    for (int i = 0; i < fetcher.Length; i++)
    {
        var index = i;
        var type = record.GetFieldType(i);
        if (type == typeof(bool))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetBoolean(index));
        else if (type == typeof(sbyte))
            fetcher[i] = (row, reader) => row.AppendValue((sbyte)reader.GetByte(index));
        else if (type == typeof(byte))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetByte(index));
        else if (type == typeof(short))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt16(index));
        else if (type == typeof(ushort))
            fetcher[i] = (row, reader) => row.AppendValue((ushort)reader.GetInt16(index));
        else if (type == typeof(int))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt32(index));
        else if (type == typeof(uint))
            fetcher[i] = (row, reader) => row.AppendValue((uint)reader.GetInt32(index));
        else if (type == typeof(long))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt64(index));
        else if (type == typeof(ulong))
            fetcher[i] = (row, reader) => row.AppendValue((ulong)reader.GetInt64(index));
        else if (type == typeof(float))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetFloat(index));
        else if (type == typeof(double))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetDouble(index));
        else if (type == typeof(decimal))
            fetcher[i] = (row, reader) => row.AppendValue((double)reader.GetDecimal(index));
        else if (type == typeof(DateTime))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetDateTime(index));
        else if (type == typeof(string))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetString(index));
        else
            throw new NotSupportedException(type.ToString());
    }
    return fetcher;
}
//Here is execute code
var res = 0;
var fetcher = BuildFetcher(reader);
using (var appender = duckDBConnection.CreateAppender(tableName))
{
    var fieldCount = reader.FieldCount;
    while (reader.Read())
    {
        var row = appender.CreateRow();
        for (int i = 0; i < fieldCount; i++)
        {
            fetcher[i](row, reader);
        }
        row.EndRow();
        res++;
    }
}
return Task.FromResult(res);

The probability of this occurring is approximately once every thousands of lines.

Giorgi commented 7 months ago

What error do the timestamp values have?

Cricle commented 7 months ago

What error do the timestamp values have?

The timestamp in some rows will error for inserted.

Like inserted '2020-01-01 00:00:00', when query result will be '2030-02-03 12:01:02'. I don't know it is duckdb error or the library error. I try use insert into to insert ~50w rows data, some result is error, I try insert ~15w rows, It will be ok.

Giorgi commented 7 months ago

Do you get a wrong result for the same row and column every time you run the code or is it different on each run? If it is the same every time you could put a breakpoint to see what the DuckDB.Net code does for that field to see if there is an error in converting DateTime to DuckDBTimestampStruct in DuckDBDateOnly.FromDateTime or DuckDBTimeOnly.FromDateTime

Cricle commented 7 months ago

Every time same row is wrong. I debug for check native api result, the DuckDBTimestamp is right, But I don't know if the others are correct.

Giorgi commented 7 months ago

Does it happen if you try to import only that one row?

Cricle commented 7 months ago

No, I try appender or insert is ok.

Giorgi commented 7 months ago

If you can create a project that reproduces the issue I can try to have a look at it. It will be easier if you export MySQL data to a CSV file and use that in the demo so that I don't have to set up a database.

Cricle commented 7 months ago

Ok, I will try to write a mini project and export all data.

Thanks.

Cricle commented 7 months ago

AppendFail.zip

My time zone is UTC+8.

Giorgi commented 7 months ago

Can you modify source code to read from CSV instead of MySQL?

Cricle commented 7 months ago

AppendFail.zip Already modify to read csv

Giorgi commented 3 months ago

@Cricle Can you check if you still experience this issue with the 0.10.1 version?

Giorgi commented 3 months ago

Closing due to inactivity. If you still experience the issue feel free to reopen it.