data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
http://www.data-forge-js.com/
MIT License
1.34k stars 77 forks source link

Rolling window with select function not behaving correctly #140

Open e-tang opened 2 years ago

e-tang commented 2 years ago

When using rollingwindow with the select function, the rolling windows are correct but select function calls the transformer more than it should.

Please see the following test code:

import { assert, expect } from 'chai';
import 'mocha';
import * as dataForge from './index';
import { DataFrame, Series, ISeries, Index } from './index';

var df = new DataFrame({ index: [10, 20, 30, 40, 50], values: [0, 1, 2, 3, 4] });
var counter = 0;
var windows = df
.rollingWindow(2);

var testSeries = windows.select((window) => {
    console.log("counter:" + counter);
    return [window.getIndex().last(), ++counter];
})   
.withIndex(pair => pair[0])
.inflate(pair => pair[1]);

The following are the outputs:

counter:0
counter:1
counter:2
counter:3
counter:4
counter:5
counter:6
counter:7
counter:8
__index__
---------
20       
30       
40       
50  

I think only 4 times should be the correct number.

e-tang commented 2 years ago

Based on the testing, particularly the first window got called twice

e-tang commented 2 years ago

Tried to use map function on Series directly it turned out the same, the function got called more than it should. So it should be the bug in the iterator.

e-tang commented 2 years ago

OKay, figured out why this happened.

After going through the source code, find out whenever the series is baked, the internal iterators are called and trigger the selector function. During this process, the selector function can be call numerous times from getColumnNames(), toArray(), and toPairs().

ashleydavis commented 2 years ago

I believe this is due to Data-Forge's laziness and you need to write the transformer function so that it can execute without side effects.

I do realise this is a problem and I'm considering removing the the laziness in Data-Forge v2.

I've labelled this issue and will return to it for v2.