Closed phrz closed 5 years ago
Yeah there is! Are you able to use the function renameSeries?
const renamedDf = rawData.renameSeries({ "OldColumnName": "NewColumnName" });
You might be able to use it like this:
let rawData =
DataForge.readFileSync('data.csv')
.parseCSV({ dynamicTyping: true })
let renameSpec = {};
for(let name of rawData.getColumnNames()) {
renameSpec[name] = slugify(name);
}
rawData = rawData.renameSeries(renameSpec);
Otherwise I'll have to think about this, seems like there should be a better way to do it!
Not to overcomplicate things but I switched this little program over to TypeScript to see if it made things easier, so specifically the below code (analogous to your suggestion) did in fact work fine with all my strict tsconfig.json
settings.
let renameSpec: {[index: string]: string;} = {}
for(let name of rawData.getColumnNames()) {
renameSpec[name] = slugify(name)
}
rawData = rawData.renameSeries(renameSpec)
However, I'm curious: given the definition of IColumnRenameSpec, why does this function not accept a renameSpec
that is a Map
? For example, if I produced renameSpec
as follows:
let renameSpec: Map<string, string> =
rawData.getColumnNames().map((s: string) => [s, slugify(s)])
I am by no means a TypeScript expert, so I'm honestly wondering why.
You are correct! It should be using Map already. Thanks for leaving feedback and pointing it out.
I'll leave this issue open until I get it fixed.
Hey I tried using Map
but it didn't seem to work the way I hoped.
It makes the unit tests fail to compile.
Specifically I can't do this with map:
var df = new DataFrame({
columns: {
A: [1, 2, 3, 4],
B: ['a', 'b', 'c', 'd'],
},
index: [11, 12, 13, 14],
});
Where the 'columns' field is a Map an initialized from a literal object.
I extracted the 'columns' initializer to a separately typed variable so I could more clearly see the error. Here is the extracted code:
const columns: Map<string, Iterable<any> | ISeries<any, any>> = {
A: [1, 2, 3, 4],
B: ['a', 'b', 'c', 'd'],
};
This is the error:
[ts]
Type '{ A: number[]; B: string[]; }' is not assignable to type 'Map<string, Iterable<any> | ISeries<any, any>>'.
Object literal may only specify known properties, and 'A' does not exist in type 'Map<string, Iterable<any> | ISeries<any, any>>'.
What do you think the problem is?
Am I trying to use Map
the wrong way?
Thanks for your patience! I did a lot of reading about this topic, and there seems to be no way in Typescript to use polymorphism or generics to treat Objects and Maps the same. Typescript does not have Generic specialization, or otherwise a way to have two functions with the same name accepting different types as parameters. As such, my proof of concept uses (abuses?) instanceof
. Run-time type checking is ugly, but it's the only solution I could find for this specific issue.
I threw this PoC together quickly, but it boils down to some type definitions to make things more readable (only for the sake of example), followed by a function loadDataFrameSpec
that uses instanceof
to decide whether to pass the spec
parameter onto loadDataFrameSpecMap
or loadDataFrameSpecObject
, which can be thought of as the "specialized" versions of the pseudo-generic loadDataFrameSpec
. These dummy functions just iterate over their respective parameters to prove that you can get identical output through different iteration behavior.
I define key
and column
during Map iteration to prove that these elements can be extracted from the Map while maintaining type safety. I don't do this for the Object because I assume you are already successfully doing just that. All in all, sorry this PoC doesn't really tie into your code and use its internal types, I haven't had the time to pore over the source code very much.
// target: es2015
import * as DF from 'data-forge';
type ColumnType = Iterable<any> | DF.ISeries<any,any>;
interface StringIndexedObject<V> { [key: string]: V; };
type DataFrameSpecObject = StringIndexedObject<ColumnType>;
// this function checks type and calls
// the next two functions
function loadDataFrameSpec(spec: any) {
console.log('\nCalled loadDataFrameSpec');
if(spec instanceof Map) {
loadDataFrameSpecMap(spec);
} else if(spec instanceof Object) {
loadDataFrameSpecObject(spec);
} else {
console.warn('Unexpected type! Do something...');
}
}
function loadDataFrameSpecMap(spec: Map<string, ColumnType>) {
console.log('loadDataFrameSpecMap running');
for(let entry of spec.entries()) {
const key: string = entry[0];
const column: ColumnType = entry[1];
console.log(`${key} => ${column}`);
}
}
function loadDataFrameSpecObject(spec: DataFrameSpecObject) {
console.log('loadDataFrameSpecObject running');
for(let key in spec) {
console.log(`${key} => ${spec[key]}`);
}
}
// example
const plainOldObject = {
A: [1, 2, 3, 4],
B: ['a', 'b', 'c', 'd'],
};
const mapInstead = new Map<string,any>([
['A', [1,2,3,4]],
['B', ['a','b','c','d']]
]);
loadDataFrameSpec(plainOldObject);
loadDataFrameSpec(mapInstead);
Called loadDataFrameSpec
loadDataFrameSpecObject running
A => 1,2,3,4
B => a,b,c,d
Called loadDataFrameSpec
loadDataFrameSpecMap running
A => 1,2,3,4
B => a,b,c,d
Wow thanks for doing the research on this.
You are welcome to try and add this. I'll accept pull request for this as long as tests still work and it doesn't complicate the code too much.
Hey Paul,
Just wondering if you've seen my new library yet?
It's called Data-Forge Plot and integrates plotting/charting with Data-Forge.
Please check out my blog post on it.
It's early days yet, but I'm trying to collect feedback on it.
Ashley, I'd be happy to take a look in my free time. I'm still waiting for the opportunity to sit down and work on this patch if possible.
Awesome, let me know what you think!
Hey there. Don't know if this is useful in any way, but I found that a relatively simple solution to rename columns was to access the df.content.columnNames
and df.content.values.columnNames
properties of the dataframe and pass them a new array of column names. You could either just pass a new array like I did, or maybe apply some function (slugify
, like @phrz 's example) to the array in place.
Be careful with the content
property!
This is internal, undocumented and it is marked as private
in TypeScript.
I reserve the right to change the way this works under the hood in the future, which will break your code if you depend on it.
Also the content
property is lazily generated once the datafame is evaluated, it's set to null otherwise so you can't rely on it.
If you'd like to suggest a better API for renaming columns that is more convenient, I'd be happy to look into implementing it.
There doesn't seem to be an action that can be take for this issue, so I'm closing it.
Is there a better way to rename columns in a
DataFrame
than the following code? I could not find any information for renaming columns in the Guide or API doc. In this code, I'm trying to rename columns "Like This" to "like_this".