data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
http://www.data-forge-js.com/
MIT License
1.34k stars 77 forks source link

data-forge not working nicely with collectionsjs #7

Closed btruhand closed 6 years ago

btruhand commented 6 years ago

First and foremost, I do not think this is a fault in data-forge but I feel like I should raise the issue here so others and author(s) are aware of it

I have the following code:

const dataForge = require('data-forge');
const util = require('util');
//const c = require('collections/fast-map');

function test() {
  const timestamps = [ [ '2018-05-21 15:38:04' ],
    [ '2018-05-21 15:38:09' ],
    [ '2018-05-21 15:38:09' ],
    [ '2018-05-21 15:38:08' ] ];
  const tsDataFrame = new dataForge.DataFrame({
    columnNames: ['Timestamp'],
    rows: timestamps
  }).setIndex('Timestamp');
  let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
    Timestamp: tsGroup.first().Timestamp,
    QPS: tsGroup.count()
  })).inflate();
  console.log(groupedDf.toString());
}

test();

Running the above code will give me the following result

__index__  Timestamp            QPS
---------  -------------------  ---
0          2018-05-21 15:38:04  1
1          2018-05-21 15:38:09  2
2          2018-05-21 15:38:08  1

which is what I expected

However if //const c = require('collections/fast-map'); is uncommented, running it again I will get

__index__  [object Object]  false
---------  ---------------  -----
0
1
2

Clearly this is a mistake. After hours of debugging I can at least spot one possible reason for the error. In the build version of data-forge within DataFrame.prototype.toString function we have the following (cut down for brevity's sake)

    DataFrame.prototype.toString = function () {
        var columnNames = this.getColumnNames();
        var header = ["__index__"].concat(columnNames);
        // more things down below
    };

Doing console.log(columnNames) with collectionsjs required gives me the following:

[ SelectIterable {
    iterable: [ [Object], [Object], [Object] ],
    selector: [Function] },
  false ]

Without collectionsjs I will get the expected result: [ 'Timestamp', 'QPS' ]

Inspecting further the getColumnNames function tells me that Array.from is used which is overridden by collectionsjs implementation: https://github.com/montagejs/collections/blob/master/shim-array.js#L26

I managed to fix things on data-forge side by doing a seemingly unnecessary function call:

let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
  Timestamp: tsGroup.first().Timestamp,
  QPS: tsGroup.count()
})).inflate().resetIndex();

This will give me the correct result regardless if collectionsjs was used or not

There's an issue raised already in collectionsjs regarding Array.from https://github.com/montagejs/collections/issues/169 and there is also a PR https://github.com/montagejs/collections/pull/173. I'm not sure about the progress of either

ashleydavis commented 6 years ago

Thanks for raising the issue.

If I remove the use of Array.from would that solve the problem?

btruhand commented 6 years ago

I never did try... though I personally think it's better off keeping data-forge code as is. I'd write it down as a bug on collectionsjs than it is of data-forge

You can either just close this ticket, or put some labels that indicate that this is only some form of warning (or perhaps document it in a wiki)

ashleydavis commented 6 years ago

Ok thanks for logging the issue.

Please be sure to put a star on the repo!