apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.5k stars 1.02k forks source link

INSERT INTO SQL failing on CSV-backed table #10324

Open singularsyntax opened 2 months ago

singularsyntax commented 2 months ago

Describe the bug

Hello,

When I try to insert data with the INSERT INTO SQL syntax (see reproduction code below), I get the error: Inserting query must have the same schema with the table.

[2024-05-01T00:48:23Z INFO] TABLE SCHEMA: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [FunctionalDependence { source_indices: [0], target_indices: [0, 1], nullable: false, mode: Single }] } }
[2024-05-01T00:48:23Z INFO] DATAFRAME SCHEMA: DFSchema { fields: [DFField { qualifier: None, field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [] } }
thread 'main' panicked at src/main.rs:317:88:
called `Result::unwrap()` on an `Err` value: Plan("Inserting query must have the same schema with the table.")

As logged above, the problem seems to be in the discrepancy between the table schema, which is qualified with the table name, and the query schema, which is not.

The code I'm using is about as simple as I can imagine. Am I missing something? Is there some example code that demonstrates how to use INSERT INTO SQL correctly? Or is this a bug?

To Reproduce

async fn df_test() {
    let ctx = SessionContext::new();
    let sql = "CREATE EXTERNAL TABLE test (k VARCHAR PRIMARY KEY NOT NULL, v VARCHAR NOT NULL) STORED AS CSV LOCATION './store/test/'";
    let df = ctx.sql(sql).await.unwrap();

    df.collect().await.unwrap();

    let table_df = ctx.table("test").await.unwrap();
    info!("TABLE SCHEMA: {:?}", table_df.schema());

    let sql = "INSERT INTO test (k, v) VALUES ('foo', 'bar')";
    let query_df = ctx.sql(sql).await.unwrap();
    info!("DATAFRAME SCHEMA: {:?}", query_df.schema());

    let _result = query_df.write_table("test", DataFrameWriteOptions::default()).await.unwrap();
}

Expected behavior

Insertion of the row ('foo', 'bar') is successful. DataFusion creates a CSV file in the filesystem corresponding to the inserted data.

Additional context

[dependencies]
datafusion = "37.1.0"
singularsyntax commented 2 months ago

Additional information:

If I replace the call to write_table() with write_csv():

let _result = query_df.write_csv("foo", DataFrameWriteOptions::default(), None).await.unwrap();

I get the following error:

thread 'main' panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-physical-plan-37.1.0/src/insert.rs:127:9:
assertion `left == right` failed
  left: 2
 right: 1
phillipleblanc commented 2 months ago

This looks like a bug. I wonder if this is a regression from #9595?

yyy1000 commented 2 months ago

I think it's a latent bug which doesn't relate to #9595 , I tested using version 36 code. I can try to help it to see what's wrong with it. :)