cda-group / arc

Programming Language for Continuous Deep Analytics
https://cda-group.github.io/arc/
44 stars 6 forks source link

RFC: Dynamic arrays #290

Closed segeljakt closed 1 year ago

segeljakt commented 3 years ago

Part of #260. Dynamic arrays are found in most imperative languages like Python, Rust, Java, C++. Sometimes they are referred to as lists (Python), vectors (Rust/C++). A dynamic array generally has the following properties:

Use cases

A use case is to support "holistic windows" which are windows over a stream whose aggregate can only be calculated once the whole window has been observed. For example, to calculate the median over every nth element:

task Median(n: u32): ~i32 -> ~i32 {
    val window = [];
    loop {
        for i in 0..n {
            on event => window[i] = event;
        }
        if n % 2 == 0 {
            emit (window[n/2] + window[n/2+1]) / 2;
        } else {
            emit window[n/2];
        }
    }
}

A holistic aggregation could also be for example to train a machine learning model. I think though that we should have a tensor type for this use-case which can support higher dimensional structures like ndarray.

Operations

Python's lists support the following operations:

Implementation

The dynamic array could be implemented by wrapping a Rust Vec<T>. Rust vectors are fast as long as you do not start shifting elements. Another option is the Rust VecDeque<T> which allows O(1) insertion at the front and back at the cost of some overhead. A third option is to use an unrolled linked list (or chunked array queue), which is a linked list of vectors, to support more dynamic access patterns.