cda-group / arcon

State-first Streaming Applications in Rust
https://cda-group.github.io/arcon/
Apache License 2.0
175 stars 17 forks source link

Data Sharding #286

Open Max-Meldrum opened 2 years ago

Max-Meldrum commented 2 years ago

The following are some aspects we need to address/implement to enable data parallelism:

segeljakt commented 2 years ago

One question, should it be possible for operators to be generic over KeyedStream<K, T> and Stream<T>? For example, the map operator in:

let stream0: Stream<_> = ...;
let stream1: Stream<_> = stream.map(...);

let stream2: KeyedStream<_, _> = ...;
let stream3: KeyedStream<_, _> = stream.map(...);

If not, we need special operator implementations to handle each kind of stream. In Flink they use subtyping. Although Rust does not have it, we could maybe use traits to achieve something similar:

trait Stream<T> { ... }
trait KeyedStream<K, T>: Stream<T> { ... }
Max-Meldrum commented 2 years ago

One question, should it be possible for operators to be generic over KeyedStream<K, T> and Stream<T>? For example, the map operator in:

let stream0: Stream<_> = ...;
let stream1: Stream<_> = stream.map(...);

let stream2: KeyedStream<_, _> = ...;
let stream3: KeyedStream<_, _> = stream.map(...);

If not, we need special operator implementations to handle each kind of stream. In Flink they use subtyping. Although Rust does not have it, we could maybe use traits to achieve something similar:

trait Stream<T> { ... }
trait KeyedStream<K, T>: Stream<T> { ... }

Would be nice to have the same map on both. Yeah, may need to use traits in that way. Something to discuss tomorrow 😄