Gmousse / dataframe-js

No Maintenance Intended
https://gmousse.gitbooks.io/dataframe-js/
MIT License
460 stars 38 forks source link

df.union() with mismatching column names #15

Closed lmeyerov closed 7 years ago

lmeyerov commented 7 years ago

When a.union(b) is called and a, b have distinct column names, I'd expect the concat to still work, while currently, it triggers an exception.

In the interim, I'm wrapping with:

function unionDFs(a, b) {

    const aNeeds = b.listColumns().filter((v) => aCols.indexOf(v) === -1);
    const bNeeds = a.listColumns().filter((v) => bCols.indexOf(v) === -1);

    const a2 = aNeeds.reduce((df, name) => df.withColumn(name, () => 'n/a'), a);
    const b2 = bNeeds.reduce((df, name) => df.withColumn(name, () => 'n/a'), b);

    return a2.union(b2);
}
Gmousse commented 7 years ago

Hey Can you send me the error raised by your code?

Gmousse commented 7 years ago

Ok thanks to your issue I have detected a bug in .union.

Indeed it has the normal behaviour, it checks if the 2 dataframes and if they haven't it throws an exception. BUT if the 2 dataframes have the same schema (same columnNames) but not in the same order, it doesn't throw an exception but the result is wrong. Waiting a hotfix (this night) you can quickly fix it by applying a b2.restructure(a2.listColumns()) to reorder the columns.

I will work on it.

I hope this is related with your issue.

Gmousse commented 7 years ago

@lmeyerov The hotfix 1.1.2 resolves the column order bug on .union(). Please confirm me that your problem is solved. Thanks

Gmousse commented 7 years ago

@lmeyerov Any news on your issue? Is it solved by the version 1.1.2?