Closed lgomez closed 6 years ago
Thanks for logging an issue.
There's always many ways to do things like this, here's one way.
This is a bit tricky, because your data isn't really stored in the Data-Forge style, but I'll give you a solution where you load the data manually.
First though I had to modify your data slightly to make it proper JSON syntax:
// A.json
[
[0, "A1"],
[1, "A2"],
[2, "A3"],
[3, "A4"],
[4, "A5"]
]
// B.json
[
[2, "B1"],
[3, "B2"],
[4, "B3"],
[5, "B4"],
[6, "B5"]
]
Your data can't be loaded directly into a dataframe because it doesn't contain any column names.
So instead I load manually, parse manually and pass the loaded data into Series:
const dataForge = require("data-forge");
const fs = require('fs');
let a = new dataForge.Series(JSON.parse(fs.readFileSync("A.json")));
let b = new dataForge.Series(JSON.parse(fs.readFileSync("B.json")));
Now I inflate each series to a dataframe and separate out the index and A and B columns:
let aDF = a.inflate(row => ({ index: row[0], A: row[1] }));
let bDF = b.inflate(row => ({ index: row[0], B: row[1] }));
At this point I print both series to check what I have:
console.log("a:");
console.log(aDF.toString());
console.log("b:");
console.log(bDF.toString());
I see the following output:
a:
__index__ index A
--------- ----- --
0 0 A1
1 1 A2
2 2 A3
3 3 A4
4 4 A5
b:
__index__ index B
--------- ----- --
0 2 B1
1 3 B2
2 4 B3
3 5 B4
4 6 B5
Now I'm ready to join these two dataframes by connecting their index columns and merging their A and B columns:
const final = aDF.joinOuter(bDF,
rowA => rowA.index, // Column from dataframe a to merge on.
rowB => rowB.index, // Column from dataframe b to merge on.
(rowA, rowB) => { // Selector function to merge rows from a and b.
return {
index: rowA ? rowA.index : rowB.index, // Merge column 0 as the index.
A: rowA ? rowA.A : undefined, // Note that we are merging column 1 from a and sometimes the value doesn't exist.
B: rowB ? rowB.B : undefined // Note that we are merging column 1 from b and sometimes the value doesn't exist.
};
}
);
Then print the result to check:
console.log("final:");
console.log(final.toString());
I see this result:
final:
__index__ index A B
--------- ----- -- --
0 0 A1
1 1 A2
2 2 A3 B1
3 3 A4 B2
4 4 A5 B3
5 5 B4
6 6 B5
If you really want exactly the same result as you proposed you simply need to promote the "index" column to be the actual index of the dataframe, then drop the "index" column, as follows:
const indexed = final.setIndex("index").dropSeries("index");
console.log(indexed.toString());
And get the following output:
__index__ A B
--------- -- --
0 A1
1 A2
2 A3 B1
3 A4 B2
4 A5 B3
5 B4
6 B5
This is the full code:
const dataForge = require("data-forge");
const fs = require('fs');
let a = new dataForge.Series(JSON.parse(fs.readFileSync("A.json")));
let b = new dataForge.Series(JSON.parse(fs.readFileSync("B.json")));
let aDF = a.inflate(row => ({ index: row[0], A: row[1] }));
let bDF = b.inflate(row => ({ index: row[0], B: row[1] }));
console.log("a:");
console.log(aDF.toString());
console.log("b:");
console.log(bDF.toString());
const final = aDF.joinOuter(bDF,
rowA => rowA.index, // Column from dataframe a to merge on.
rowB => rowB.index, // Column from dataframe b to merge on.
(rowA, rowB) => { // Selector function to merge rows from a and b.
return {
index: rowA ? rowA.index : rowB.index, // Merge column 0 as the index.
A: rowA ? rowA.A : undefined, // Note that we are merging column 1 from a and sometimes the value doesn't exist.
B: rowB ? rowB.B : undefined // Note that we are merging column 1 from b and sometimes the value doesn't exist.
};
}
);
console.log("final:");
console.log(final.toString());
const indexed = final.setIndex("index").dropSeries("index");
console.log(indexed.toString());
Please be sure to star the repo!
Ashley,
Thank you so much for this reply. Very useful. Will try it a bit later.
Thank you
Hey Luis,
Just wondering if you've seen my new library yet?
It's called Data-Forge Plot and integrates plotting/charting with Data-Forge.
Please check out my blog post on it.
It's early days yet, but I'm trying to collect feedback on it.
Hi,
Thank you for a great library. I was looking for something like this and read about it in the latest issue of Node Weekly. Started playing with it but haven't been able to get the result I'd like. I hope you don't mind if I ask...
I have the following dataframes:
I need to end up with:
The data comes from a bunch of files that contain one 2D array each structured like this:
Notice how I need the resulting DataFrame to use the file names as the column titles.
I tried using concat and joins but don't quite get this result. Would you mind pointing me in the right direction?
Thank you,