apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.21k stars 957 forks source link

Auto-update mechanism for dataframe test #10373

Open jayzhan211 opened 2 weeks ago

jayzhan211 commented 2 weeks ago

Is your feature request related to a problem or challenge?

While working on #10364, I found that changing the result in the rust test is quite painful.

Currently, we need to fix the string manually one by one

It would be nice if there is an easy way to update the test.

In sqllogictest, we can easily done it with --complete flag.

Describe the solution you'd like

Given the test, having a very easy way to auto-update result string

#[tokio::test]
async fn test_fn_upper() -> Result<()> {
    let expr = upper(col("a"));

    let expected = [
        "+---------------+",
        "| upper(test.a) |",
        "+---------------+",
        "| ABCDEF        |",
        "| ABC123        |",
        "| CBADEF        |",
        "| 123ABCDEF     |",
        "+---------------+",
    ];
    assert_fn_batches!(expr, expected);

    Ok(())
}

Approach 1

One possible solution is writing the result to the file, and comparing it with similar to sqllogictest, but since we need to call expr API, the API calls remain in the rust test, and only the output goes to output file.

Approach 2

Based on https://github.com/apache/datafusion/issues/8736 We can switch between SQL string and Expr and compare the result like sqllogictest does

run_query may be like

    let ctx = SessionContext::new();
    // one csv table per test file with the same name
    // so tests/data/example.csv is the table for tests in tests/data/example.slt
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // sql to expr
    let expr = ascii(col("a"));
    let df = df.select(vec![expr])?.collect().await?;
    // check the values like sqllogictest
    assert_eq!(df, "expected string");

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 1 week ago

Something we have used to great effect in influxdb is https://insta.rs/

You can then do the equivalent of sqllogictest --complete (even for results within files) with a command like

cargo insta review

Some downsides are that it is is yet another dependency (and to use it you need to install cargo install cargo-insta