enso-org / enso

Hybrid visual and textual functional programming.
https://enso.org
Apache License 2.0
7.34k stars 323 forks source link

Add offset_value method and support to Enso #9096

Open Cassandra-Clark opened 7 months ago

Cassandra-Clark commented 7 months ago

As a user, I want to be able to perform operations that calculate across multiple rows. This enables a wide variety of use cases, such as data cleanup and advanced running calculations.

The offset method offers a standard method for performing these types of calculations. Databases such as sql server offer methods like lead, lag, and over by, but these are often difficult for new users to understand.

There are two options for this: Option 1: By creating offset_value as a method for referencing a prior value via the set method, we can enable powerful functional capability directly within set without needing to create a more complex design.

The proposed API is as follows:

offset_value : Column -> Integer -> Group_By -> Sort -> If_Missing offset_value column rows_offset group_by=Nothing sort=Nothing if_missing=Nothing

This method will optionally take a grouping argument group_by. It will use the existing order of the table otherwise.

The rows_offset value can be positive or negative and is the relative index reference of the offset value being retrieved. For example, on index 10, a rows_offset of -1 will return the value of the column at index 9, and a rows_offset of 1 will return the value of the column at index 11.

Column allows a dropdown selection of the column that an offset value is being retrieved from.

if_missing allows a set of options for default behavior if no value is returned, for example if there is no value at the referenced location in the table, such as referencing index-1 from index 0. It has three options:

Null 0 or Empty - if the field is a number, returns 0, if the field is a string returns an empty value Closest value - returns the closest applicable value from the table

rows_offset will return the value from the other specified row for use in a formula inside of set

Option 2 Alternatively, we can create offset_value as a standalone function which returns a new column with the offset values. This is a simpler approach technically, and allows a user to leverage the existing formula functions by referencing the desired offsets. It can support sorting and grouping. The primary downside to this approach is that while it is simple, it is verbose, requiring many steps (IE, to check row-1:value and row-2:value, you need to create two offset_value nodes, then use set to define the value, and then remove_columns to get rid of the extraneous values.

This api would look something like this: offset_value : Column -> Integer -> Group_By -> Sort -> Text -> If_Missing offset_value column rows_offset group_by=Nothing sort=Nothing new_name=Nothing if_missing=Nothing

This method optionally takes a grouping argument, a sort argument, and a new_name argument. If these are Nothing, the grouping and sort would take incoming record order, and the newname would be automatically assigned as "offset(rowsoffset)(column)", IE a column "value" with a rowsoffset of -1 would be automatically named to "offset-1_value"

Cassandra-Clark commented 3 months ago

Per meeting 2024-06-28 - Chose to go forward with offset_value as a Table function and a Column function, as well as adding support for expression editor.