scotts commented 4 years ago

In the current README, when we mention a particular algorithm, we link to its C++ implementation. This does not scale as we add implementations in new languages. We should come up with a format that makes it easier to easily point readers to implementations in a language they are interested in.

segeljakt commented 4 years ago

Maybe a capability matrix like https://beam.apache.org/documentation/runners/capability-matrix/ could be useful. I'm not sure how easy it would be to write it in markdown though.

I think it would also be very interesting to have a benchmark between the implementations.

scotts commented 4 years ago

I think a capability matrix is a useful way to categorize the algorithms, as we have several dimensions we care about (operations supported; in-order required; algorithmic analysis; space required; implementations). I'm worried, thought, that is a lot to parse up-front. It might be easier to maintain and read it in a more prose-like style rather than in one large table. Maybe something like (these are all bogus URLs):

DABA

full name: De-Amortized Banker's Aggregator ordering: FIFO operator requirements: associativity time complexity: worst-case O(1) space requirements: 2n first appeared: [Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time]() implementions: C++, Rust

FiBA

full name: Finger B-Tree Aggregator ordering: out-of-order allowed, requires timestamps operator requirements: associativity time complexity: worst-case O(log n d) where d is distance from being in-order; reduces to worst-case O(log n) when d=0; and average-case O(log d) and reduces to average-case O(1) when d=0 space requirements: O(n) first appeared: [Optimal and General Out-of-Order Sliding-Window Aggregation]() implementions: C++, Rust, Java

scotts commented 4 years ago

Also, I think what to do about benchmarks warrants a new issue.

hirzel commented 4 years ago

This looks great so far! A few tweaks:

Instead of citing Jon Skeet on Stack Overflow we should cite adamax on Stack Overflow. The former only presents a LIFO stack, whereas the latter presents a FIFO queue.
Instead of saying FiBA requires space n, I would say it requires space O(n). The exact constant factor depends on the arity.
For SOE, we can say the time is worst-case O(1). This is of course predicated on the combine function being O(1), but we also assume that elsewhere. Furthermore, it is predicated on using a chunked-array deque as the underlying backing store.

hirzel commented 4 years ago

One more thing I just noticed. We are saying "out-of-order allowed" for Recalc and for SOE. However, I think we currently only provide FIFO implementations for those two algorithms.

scotts commented 4 years ago

However, I think we currently only provide FIFO implementations for those two algorithms.

I thought about that, too. I think maybe we should make the qualification when we link to the implementation and keep the SWAG itself general.

ktangwon commented 4 years ago

This is looking good. Right now, the README file uses the terms average-case and worst-case to describe algorithms' running time. However, the term "amortized" (e.g., amortized O(log n) and amortized O(1)) would be a more precise characterization of the notion of averaging we use.

scotts commented 4 years ago

@ktangwon, what did we say the space requirements are for IOA?

ktangwon commented 4 years ago

@ktangwon, what did we say the space requirements are for IOA?

IOA keeps 2n partial aggregates (same as DABA) and additionally keeps a few extra pointers (between nodes and for suspended computations). It is hard to pin down the exact constant, but we can safely say IOA requires O(n) space.

IBM / sliding-window-aggregators

Generalize algorithm links per language in README #14

DABA

FiBA