blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Discover on object arrays checks for string values #121

Closed mrocklin closed 9 years ago

mrocklin commented 9 years ago
In [1]: from datashape import discover
In [2]: import numpy as np
In [3]: x = np.array([('Alice', 1), ('Bob', 2)], dtype=[('name', 'O'), ('amt', 'i4')])

# Before
In [4]: discover(x)
Out[4]: dshape("2 * {name: object, amt: int32}")

# After
In [4]: discover(x)
Out[4]: dshape("2 * {name: string, amt: int32}")

This checks the first five values of all arrays identified as object type. If all five are strings then it calls it a string column. This is less conservative.