apecloud / myduckserver

MySQL & Postgres Analytics, Reimagined
184 stars 8 forks source link

feat: add initial support for 'LOAD DATA LOCAL INFILE' #78

Closed fanyang01 closed 2 months ago

fanyang01 commented 2 months ago

Resolves: #69

This PR adds the initial support for LOAD DATA [LOCAL] INFILE and enables most of the test cases.

In addition, this PR also adds the initial support for transactions. To make this possible, I have done the following refactoring:

GaoYusong commented 2 months ago

TestIsPureDataQuery can make itself pure by using enginetest.NewDefaultMemoryHarness() instead of NewDefaultDuckHarness

fanyang01 commented 2 months ago

TestIsPureDataQuery can make itself pure by using enginetest.NewDefaultMemoryHarness() instead of NewDefaultDuckHarness

Get it, thanks!

With the changes in this PR, the query engine test has become significantly slower. At present, I am uncertain about the root cause of this issue.

GaoYusong commented 2 months ago

TestIsPureDataQuery can make itself pure by using enginetest.NewDefaultMemoryHarness() instead of NewDefaultDuckHarness

Get it, thanks!

With the changes in this PR, the query engine test has become significantly slower. At present, I am uncertain about the root cause of this issue.

I log every query execution time to a CSV file query_times.csv in my branch. You can run it with the following code in your branch to generate the same file. This might help identify the root cause.

func TestQueriesSimple(t *testing.T) {
    harness := NewDefaultDuckHarness()

    harness.QueriesToSkip(notApplicableQueries...)
    harness.QueriesToSkip(waitForFixQueries...)
    QueriesTestInternal(t, harness)
}

func QueriesTestInternal(t *testing.T, harness enginetest.Harness) {
    // Open a file to record query times as CSV
    f, err := os.OpenFile("query_times.csv", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
    if err != nil {
        t.Fatalf("Failed to open CSV file: %v", err)
    }
    defer f.Close()

    writer := csv.NewWriter(f)
    defer writer.Flush()

    // Write CSV header if file is empty
    if stat, _ := f.Stat(); stat.Size() == 0 {
        if err := writer.Write([]string{"Timestamp", "Query", "Duration"}); err != nil {
            t.Fatalf("Failed to write CSV header: %v", err)
        }
    }

    harness.Setup(setup.SimpleSetup...)
    e := mustNewEngine(t, harness)
    defer e.Close()
    ctx := enginetest.NewContext(harness)
    for _, tt := range queries.QueryTests {
        t.Run(tt.Query, func(t *testing.T) {
            if sh, ok := harness.(enginetest.SkippingHarness); ok {
                if sh.SkipQueryTest(tt.Query) {
                    t.Skipf("Skipping query plan for %s", tt.Query)
                }
            }
            if enginetest.IsServerEngine(e) && tt.SkipServerEngine {
                t.Skip("skipping for server engine")
            }

            start := time.Now()
            enginetest.TestQueryWithContext(t, ctx, e, harness, tt.Query, tt.Expected, tt.ExpectedColumns, nil, nil)
            duration := time.Since(start)

            // Write query time to CSV
            if err := writer.Write([]string{time.Now().Format(time.RFC3339), tt.Query, duration.String()}); err != nil {
                t.Errorf("Failed to write to CSV: %v", err)
            }
            writer.Flush()
        })
    }
}
fanyang01 commented 2 months ago

@GaoYusong The slowdown occurred due to an excess of unclosed sql.DB instances, as each call to NewEngine now creates a new sql.DB. To address this, I've added a DuckTestEngine wrapper that overrides the Close() method of sqle.Engine to ensure all open transactions and connections are properly closed.

fanyang01 commented 2 months ago

The Test Query Engine CI step runs much faster than the main branch now (~1min vs ~4min).

GaoYusong commented 2 months ago

The Test Query Engine CI step runs much faster than the main branch now (~1min vs ~4min).

Cool! Is this due to the one-to-one mapping between the frontend and backend to make it more cache-friendly?

fanyang01 commented 2 months ago

It seems that previously we left many connections unclosed, causing the test to become slower and slower.

GaoYusong commented 2 months ago

It seems that previously we left many connections unclosed, causing the test to become slower and slower.

Get it😂