blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Use Python data structures to define datashape #71

Open mrocklin opened 10 years ago

mrocklin commented 10 years ago

We currently use lots of internal data structures like Tuple, Record and DataShape to construct datashapes. Maybe we can get away with tuple, ordereddict, and list?

`10 * var * {name: string}`

DataShape([10, var, OrderedDict([['name', 'string']]))

This might make traversing these data structures much cleaner. I think that this choice should be mostly invisible to the user.

FRidh commented 9 years ago

I can imagine such a change would make it also easier for users to generate a datashape, since you don't have to make a string out of a mapping.

For example, I am currently using a parallel map for computations, outputting tuples. The tuples just contain data, no meta-information. I would like to write this data into a file using into, and this requires a dshape. The first couple of columns contain strings, the other datetimes. I would prefer to only having to generate a mapping with column names and types in the form of an OrderedDict and perhaps add var, than to also having to build a dshape string out of it afterwards.

mrocklin commented 9 years ago

I very much agree. As a temporary fix you can create a datashape from data structures if you are willing to use some of the internal API.

In [1]: import datashape as ds

In [2]: ds.var * ds.Record([['name', ds.string], ['balance', ds.int64 ]])
Out[2]: dshape("var * {name: string, balance: int64}")