blaze / castra

Partitioned storage system based on blosc. **No longer actively maintained.**
BSD 3-Clause "New" or "Revised" License
153 stars 21 forks source link

Speedup loading of object columns #33

Closed jcrist closed 9 years ago

jcrist commented 9 years ago

np.array(data) is much slower, and uses more memory than np.array(data, object) for object dtypes. This makes loading columns of text data ~12 times faster on my computer, and significantly reduces RAM usage when used from dask.

jcrist commented 9 years ago

This also fixes #32.