Open scotts opened 4 years ago
Maybe a capability matrix like https://beam.apache.org/documentation/runners/capability-matrix/ could be useful. I'm not sure how easy it would be to write it in markdown though.
I think it would also be very interesting to have a benchmark between the implementations.
I think a capability matrix is a useful way to categorize the algorithms, as we have several dimensions we care about (operations supported; in-order required; algorithmic analysis; space required; implementations). I'm worried, thought, that is a lot to parse up-front. It might be easier to maintain and read it in a more prose-like style rather than in one large table. Maybe something like (these are all bogus URLs):
full name: De-Amortized Banker's Aggregator ordering: FIFO operator requirements: associativity time complexity: worst-case O(1) space requirements: 2n first appeared: [Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time]() implementions: C++, Rust
full name: Finger B-Tree Aggregator ordering: out-of-order allowed, requires timestamps operator requirements: associativity time complexity: worst-case O(log n d) where d is distance from being in-order; reduces to worst-case O(log n) when d=0; and average-case O(log d) and reduces to average-case O(1) when d=0 space requirements: O(n) first appeared: [Optimal and General Out-of-Order Sliding-Window Aggregation]() implementions: C++, Rust, Java
Also, I think what to do about benchmarks warrants a new issue.
This looks great so far! A few tweaks:
Instead of citing Jon Skeet on Stack Overflow we should cite adamax on Stack Overflow. The former only presents a LIFO stack, whereas the latter presents a FIFO queue.
Instead of saying FiBA requires space n, I would say it requires space O(n). The exact constant factor depends on the arity.
For SOE, we can say the time is worst-case O(1). This is of course predicated on the combine function being O(1), but we also assume that elsewhere. Furthermore, it is predicated on using a chunked-array deque as the underlying backing store.
One more thing I just noticed. We are saying "out-of-order allowed" for Recalc and for SOE. However, I think we currently only provide FIFO implementations for those two algorithms.
However, I think we currently only provide FIFO implementations for those two algorithms.
I thought about that, too. I think maybe we should make the qualification when we link to the implementation and keep the SWAG itself general.
This is looking good. Right now, the README file uses the terms average-case and worst-case to describe algorithms' running time. However, the term "amortized" (e.g., amortized O(log n) and amortized O(1)) would be a more precise characterization of the notion of averaging we use.
@ktangwon, what did we say the space requirements are for IOA?
@ktangwon, what did we say the space requirements are for IOA?
IOA keeps 2n partial aggregates (same as DABA) and additionally keeps a few extra pointers (between nodes and for suspended computations). It is hard to pin down the exact constant, but we can safely say IOA requires O(n) space.
In the current README, when we mention a particular algorithm, we link to its C++ implementation. This does not scale as we add implementations in new languages. We should come up with a format that makes it easier to easily point readers to implementations in a language they are interested in.