delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.33k stars 411 forks source link

insert into parquet file result a lot files? #2781

Closed yuyang-ok closed 3 months ago

yuyang-ok commented 3 months ago

we project use deltalake as a TableProvider and insert into deltalake result a lot of little files.

my code kind of like this.

        let table_path = self.table_path.clone();
        let data = self.data.clone();
        let total_rows: usize = data.iter().map(|r| r.num_rows()).sum();
        let stream = futures::stream::once(async move {
            let delta_table = open_table(&table_path).await?;
            let _ = DeltaOps(delta_table).write(data).await?;
            info!("Inserted into {} {} rows", table_path, total_rows);
            Ok(RecordBatch::new_empty(Arc::new(Schema::empty())))
        })
        .boxed();

how to append exists file instead create new files everytime.