cesium-ml / cesium

Machine Learning Time-Series Platform
Other
670 stars 101 forks source link

Strange codec error on import #262

Closed profjsb closed 5 years ago

profjsb commented 6 years ago

When uploading data, I get an error:

web_1 |   File "/cesium_env/lib/python3.6/site-packages/cesium/data_management.py", line 145, in parse_and_store_ts_data
web_1 |     t, m, e = parse_ts_data(ts_path, sep)
web_1 |   File "/cesium_env/lib/python3.6/site-packages/cesium/data_management.py", line 39, in parse_ts_data
web_1 |     ts_data = np.loadtxt(filepath, delimiter=sep, ndmin=2)
web_1 |   File "/cesium_env/lib/python3.6/site-packages/numpy/lib/npyio.py", line 981, in loadtxt
web_1 |     first_line = next(fh)
web_1 |   File "/usr/lib/python3.6/codecs.py", line 321, in decode
web_1 |     (result, consumed) = self._buffer_decode(data, self.errors, final)
web_1 | UnicodeDecodeError: 'utf-8' codec can't decode byte 0x98 in position 99: invalid start byte

but when looping over all the data in the tarball I get no errors:

for f in glob.glob("Learn1/LC/*.dat"):
    ts_data = np.loadtxt(f, delimiter=",", ndmin=2)

Files used: https://gist.github.com/profjsb/3243e6f0129730d20f2c47eed46f731e#file-lc-tar-gz https://gist.github.com/profjsb/3243e6f0129730d20f2c47eed46f731e#file-master-dat

profjsb commented 6 years ago

This is due to the ._ files that Mac adds to tarballs. I was able to make a clean tarball with:

COPYFILE_DISABLE=1 tar -cvzf lc.tar.gz LC/[1-9]*.dat