LaurentRDC / javelin

Haskell implementation of series, or labeled one-dimensional arrays.
https://hackage.haskell.org/package/javelin
18 stars 2 forks source link

Feature request: a lazy interface #2

Open LaurentRDC opened 11 months ago

LaurentRDC commented 11 months ago

Discussion on the Haskell Discourse instance has brought up an application which cannot be executed using javelin 0.1.0.0:

In a huge storage I have time series of a thousand devices over a year, approximately in second resolution. I want to compute the temporal mean of the standard deviation over devices of the time series. There is no way to hold all these time series in memory at the same time, so I need some lazy streaming API.

Rust's dataframe library polars also has two interfaces, one strict and one lazy. We might inspire ourselves from this.

LaurentRDC commented 4 months ago

There are multiple sequence types where there is a strict type, and the lazy type is a lazy sequence of strict chunks, for example ByteString and Text.

However, the strict Series in this package have unique ordered indexes always. This is key to performance. If we had a lazy Series type, then we could not guarantee that a Series index is both ordered and unique.

The question is: should the uniqueness and ordering of the strict variant be broken, like pandas? In this case, lazy and strict Series would have very similar semantics.