bluenote10 / NimData

DataFrame API written in Nim, enabling fast out-of-core data processing
MIT License
341 stars 22 forks source link

[feature] Transpose rows/columns of DataFrame #53

Open hiteshjasani opened 4 years ago

hiteshjasani commented 4 years ago

I have some data in a different format than what DateFrame expects. As I started looking around I see that something like pandas transpose would be what I would need. It would be handy in NimData.

For example:

Name jan feb mar
james 1 2 3
sally 4 5 6
wendy 7 8 9

into

Date james sally wendy
jan 1 4 7
feb 2 5 8
mar 3 6 9
bluenote10 commented 4 years ago

I'm not sure if this can work well with static typing (and I never felt this is a particular useful operation on heterogenous data), because the fields need to be known at compile time, and values of the first column are only known at runtime.

hiteshjasani commented 4 years ago

That's why my example was basically a matrix transpose on homogenous data. The user would have to define the target schema as well so it would all be known at compile time and would generate a runtime error if the actual dataset didn't match.

I'm not sure if it's possible with nim's metaprogramming but one idea would be to use object variants to have an abstract notion of a cell such that they are just placeholders and can hold any type of data. Then transpose would simply be moving cells around in a two dimensional array.

bluenote10 commented 4 years ago

a matrix transpose on homogenous data.

For homogeneous data it's better to go for libraries like Arraymancer where transpose is a natural operation. The point of this library is to deal with the problem of heterogeneous data with single pass iterator semantics, which doesn't fit well to transposing.