finos / perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
https://perspective.finos.org/
Apache License 2.0
7.72k stars 1.04k forks source link

Cumulative sum? #2624

Open limx0 opened 1 month ago

limx0 commented 1 month ago

Bug Report

I'm trying to calculate a cumulative sum for a particular column by using index, vlookup and for.

Steps to Reproduce:

  1. Open data table on perspective homepage
  2. Add new column with following expression
    
    var idx := 0; 
    var cum_bid := 0;

for (idx <= index(); idx += 1;) { cum_bid += vlookup("bid", idx) }

cum_bid



### Expected Result:
Cumulative bid values are calculated

### Actual Result:
`Invalid expression encountered`

### Environment:
NA: website

### Additional Context:
Is there another (better) way to calculate an accumulation/cumulative sum ?
texodus commented 1 month ago

There are a few conflated issues here.

  1. First off, the index() function appears to have some issues with real-time data, specifically, Perspective's Table.update() method has an exception for the example on Perspective's homepage, whereby simply creating the View with index() as an expression works fine. This is a straight-up bug in Perspective, I've an Issue #2627. This bug will prevent the usage of this function with real-time data.

  2. In the loop condition for (idx <= index(); idx += 1;) {, your semicolons are in the wrong place. This line should read for (; idx <= index(); idx + 1) {, but as written this is an infinite loop as the condition idx + 1 is always true (the condition should be idx <= index() instead). Perspective 2.10.1 uses a portion of the ExprTK eval code to perform type checking, and while it does not naively evaluate loops, this particular infinite loop locks the engine (due to inability to infer idx + 1 is const true) when it tries to validate the expression, e.g. as you type. This is a curious state for Perspective though, as only the engine is locked - the UI will continue to function, but no queries will complete (and no further expression validation errors will be reported) TL;DR, if you type this expression into Perspective, the engine will go into an infinite loop immediately and no further errors will show up in the UI, meaning the error you see is not necessarily the real error!

    So a few issues here, the type checker is both not providing the feedback that the for loop syntax is wrong and it is locking on syntax, Issue #2626.

  3. In the expression vlookup("bid", idx), vlookup() takes a column name as an argument, not a column value. In Perspective ExprTK 2.10, double quotes " denote the value of that column when the expression is evaluated for a row, and thus the expression vlookup("bid", idx) is passing the value of the "bid" column to vlookup(), not the column name 'bid' with single quotes, which is Perspective ExprTK's string literal syntax.

  4. for loops have a syntax ambiguity in ExprTK, as a for loop is an expression itself, equal to whatever the last expression in the loop evaluated to. It seems to be able to parse for ( .. ) { .. } integer(x) or for ( .. ) { .. }; x (with a semicolon), but it does not like for ( .. ) { .. } x (without a semicolon). I need to dig into this a bit, not sure if this is a bug in ExprTK itself or our configuration of it.

This version of the expression should work, but you'll need to drag the real-time slider all the way to the left to "paused" because (1) will cause the engine to fail if a real-time update is applied to this expression:

var cum_bid := 0;
for (var idx := 0; idx < index(); idx += 1) {
    cum_bid += vlookup('bid', idx);
};

cum_bid

Another thing to note that you will likely run into with this is Perspective ExprTK's handling of numeric types. Numeric literals in Perspective ExprTK are always assumed to be "float", unless they come from an "integer" column or are cast via the integer() function. In the example on Perspective's homepage, the Table has a "float" index type, as it was inferred from JSON input. However, for Tables without an explicit index, the internal implicit index is typed "integer". Furthermore, Perspective ExprTK numeric operators do not promote types automatically, so with an implicit index, idx <= index() would roughly be interpreted as (float value) <= (integer value), which will always fail. You can overcome this by unifying the types with the cast function integer(), as in integer(idx) < index() and vlookup("bid", integer(idx)).

We have a plan to improve Perspective ExprTK's error-reporting/type-checking in the near term. However, we also have a plan to resurrect #2102 , which will allow efficient and sort aware cumulative sum calculation, rather than just the native index ordering. We'll elaborate on these plans in a separate Issue.

limx0 commented 1 month ago

Hi @texodus,

Thank you for the detailed response! And for the usage tips.

I can confirm the above gives me the desired output for my basic use case (I am not live updating, nor using any sorting ATM), but I will leave this open until #2102 or something similar solves this in the general case.