apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.31k stars 1.19k forks source link

Auto-update mechanism for dataframe test #10373

Open jayzhan211 opened 6 months ago

jayzhan211 commented 6 months ago

Is your feature request related to a problem or challenge?

While working on #10364, I found that changing the result in the rust test is quite painful.

Currently, we need to fix the string manually one by one

It would be nice if there is an easy way to update the test.

In sqllogictest, we can easily done it with --complete flag.

Describe the solution you'd like

Given the test, having a very easy way to auto-update result string

#[tokio::test]
async fn test_fn_upper() -> Result<()> {
    let expr = upper(col("a"));

    let expected = [
        "+---------------+",
        "| upper(test.a) |",
        "+---------------+",
        "| ABCDEF        |",
        "| ABC123        |",
        "| CBADEF        |",
        "| 123ABCDEF     |",
        "+---------------+",
    ];
    assert_fn_batches!(expr, expected);

    Ok(())
}

Approach 1

One possible solution is writing the result to the file, and comparing it with similar to sqllogictest, but since we need to call expr API, the API calls remain in the rust test, and only the output goes to output file.

Approach 2

Based on https://github.com/apache/datafusion/issues/8736 We can switch between SQL string and Expr and compare the result like sqllogictest does

run_query may be like

    let ctx = SessionContext::new();
    // one csv table per test file with the same name
    // so tests/data/example.csv is the table for tests in tests/data/example.slt
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // sql to expr
    let expr = ascii(col("a"));
    let df = df.select(vec![expr])?.collect().await?;
    // check the values like sqllogictest
    assert_eq!(df, "expected string");

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 6 months ago

Something we have used to great effect in influxdb is https://insta.rs/

You can then do the equivalent of sqllogictest --complete (even for results within files) with a command like

cargo insta review

Some downsides are that it is is yet another dependency (and to use it you need to install cargo install cargo-insta

PsiACE commented 5 months ago

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

jayzhan211 commented 5 months ago

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

We could give it a try! As long the test is easy to maintain, I will take it.

The goal of this issue is to fix #10364 .

I stalled on the progress because #10364 is not a high priority issue (we can easily add alias to avoid the issue + it is probably solved in #11020), and since datafusion/example is not test, so I can not easily update it with cargo insta. Therefore, I just leave it there for now. I hope goldenfile could make datafusion/example easy to update too!

PsiACE commented 5 months ago

I think goldenfile can fulfill this requirement. I'll create some examples and submit a PR for evaluation. If everything works well, we can proceed with migrating the dataframe tests and examples. Please assign me this issue. cc @alamb @jayzhan211

alamb commented 5 months ago

Thanks @PsiACE and @jayzhan211

jayzhan211 commented 4 months ago

@PsiACE It would also be interesting that how much the size increase after the change #11105