Open andrei-ionescu opened 3 months ago
Adding support for delta as a source would be great! The delta connector is implemented on top of the filesystem connector, since most of the complexity is in consistently writing the data to S3 (see https://www.arroyo.dev/blog/streaming-to-s3-is-hard), not handling the delta metadata.
Most of the delta code is here: https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-connectors/src/filesystem/sink/delta.rs. It's integrated into the filesystem connector's two-phase commit handler in https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-connectors/src/filesystem/sink/mod.rs.
This can be implemented using the Delta-RS library.
I've also seen that in the documentation there is a Delta Lake Sink connector — https://doc.arroyo.dev/connectors/delta — but I couldn't find it in this repository. Where can I find the Delta Lake Sink connector? If it's under another connector should we make it a first-class citizen?