delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.14k stars 379 forks source link

Schema evolution on upsert (merge) #2282

Open ion-elgreco opened 5 months ago

ion-elgreco commented 5 months ago

Discussed in https://github.com/delta-io/delta-rs/discussions/2281

Originally posted by **cesar-vermeulen** March 11, 2024 Hello! With the awsome addition of merge schema support in the write operation (kudos contributors @ https://github.com/delta-io/delta-rs/pull/2246), I was wondering whether a similar functionality is on the roadmap for upsert transactions? Would be a great addition to the current functionalities! Thanks, Cheers!
JonasDev1 commented 5 months ago

In order for this function to be useful, we first need to implement an updateAll / insertAll functionality. Currently you need to specify all updates and inserts manually.

In the future something like that would be helpfull together with schema evolution:

let (table, metrics) = DeltaOps(table)
  .merge(source, "target.id = source.id")
  .with_source_alias("source")
  .with_target_alias("target")
  .when_matched_update(|update| {
   update.updateAll()
  }).unwrap()
  .when_not_matched_insert(|insert| {
    insert.insertAll()
  }).unwrap()
  .await
  .unwrap();
rtyler commented 3 weeks ago

@ion-elgreco I think with your recent improvements, we have this now right?

ion-elgreco commented 3 weeks ago

@ion-elgreco I think with your recent improvements, we have this now right?

No not on merge yet, we need to built it differently there since it needs to part of the projection expressions. We can reuse the merge schema functionality though to check whether the source and table schema is different and if they can be merged