[Discussion] Floating point precision

We're currently using data-forge for a fintech project and we're loving it! Current usage is to aggregate a list of financial transactions by month and create a sum, basically giving us a time-series representation of our data.

This works great, however JavaScript is notorious for (incorrectly) handling floating points and losing precision in the process. Unfortunately, when working with financial data you'd need that floating point precision sweetness.

Relevant snippet:

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group.deflate((row) => row.amount).sum()
  }))

Which will return something like

__index__   type     due_date    amount             
----------  -------  ----------  -------------------
2020-01-01  payable  2020-01-01  -300046.7899999998 
2020-02-01  payable  2020-02-01  -399881.57999999955
2020-03-01  payable  2020-03-01  -4724.579999999969 
2020-04-01  payable  2020-04-01  29564.520000000066 
2021-01-01  payable  2021-01-01  1750

As you can see, the floating point precision is not very favourable. Looking at the code, the sum function does exactly what you'd expect as it just sums up two number using plain JavaScript.

https://github.com/data-forge/data-forge-ts/blob/master/src/lib/series.ts#L4011-L4019

Floating point precision has been an issue for a long time and have been solved by several libraries already. I noticed this library already using numeral, which can handle floating point precision.

With that in mind, we wrote a custom aggregation function to use numeral to fix our floating point precision.

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group
      .deflate((row) => row.amount)
      .aggregate((prev: number, value: number) =>
        numeral(prev).add(value).value() // Use Numeral to add two numbers, returning the resulting value
      )
  }))

And correctly returns

__index__   type     due_date    amount    
----------  -------  ----------  ----------
2020-01-01  payable  2020-01-01  -300046.79
2020-02-01  payable  2020-02-01  -399881.58
2020-03-01  payable  2020-03-01  -4724.58  
2020-04-01  payable  2020-04-01  29564.52  
2021-01-01  payable  2021-01-01  1750

I created a separate CodeSandbox with the actual code so you can play with it: https://codesandbox.io/s/eloquent-elbakyan-16v5i?file=/src/App.tsx

With this in mind, I'd like to open up the discussion of handling floating point precision in data-forge.

The way I see it, we can either

a) Provide support for floating point precision using numeral This would improve the developer experience of this library as floating point precision "just works". (For the manipulation functions supported by numeral (add, subtract, multiply, divide))

e.g.

amount: group.deflate((row) => row.amount).sum(true) // pass in "true" to allow floating point precision

or a separate util function

amount: group.deflate((row) => row.amount).sumPrecise()

b) Clearly document that data-forge does not handle floating point precision out-of-the-box and provide an example on how to keep floating point precision using numeral or another library (numbro, bigdecimal.js etc.).

Curious to hear your thoughts about this!

data-forge / data-forge-ts

[Discussion] Floating point precision #104