blaze / castra

Partitioned storage system based on blosc. **No longer actively maintained.**
BSD 3-Clause "New" or "Revised" License
153 stars 21 forks source link

utf-8 encoding problem(?) #57

Closed barnettjacob closed 8 years ago

barnettjacob commented 8 years ago

Hi, apologies if this isn't the right place for this!

I'm trying to create a Castra with:

import dask.dataframe as dd df = dd.read_csv('/home/jacob/av_files/*.csv', names = ['property_id', 'date', 'available', 'minimum_stay', 'price', 'destination', 'primary_geo_unit', 'capacity', 'tp_rev_ct', 'snapshot_date']) df.set_index('snapshot_date', compute=False).to_castra('av_test.castra', categories = T)

But I am seeing this error, my data has plenty of non english characters.

UnicodeDecodeError: 'utf8' codec can't decode byte 0x8e in position 424: invalid start byte

Thanks Jacob

barnettjacob commented 8 years ago

looks like a dask issue actually.