Closed bschreck closed 6 years ago
row_group_offsets=[0, 10000, 20000]
- your data is only 1000 rows long, so these offsets do not make sense. The traceback is not very helpful, I agree...
But it also doesn't work without any arguments, or with 10,000 rows. Same error
It does appear to work for me without arguments - can you please try with the master
version of fastparquet?
(e.g., pip install git+https://github.com/dask/fastparquet
)
Same error (used a fresh virtualenv too).
I'm on MacOS High Sierra, using Python 3.6.4
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/writer.py in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
807 if file_scheme == 'simple':
808 write_simple(filename, data, fmd, row_group_offsets,
--> 809 compression, open_with, has_nulls, append)
810 elif file_scheme in ['hive', 'drill']:
811 if append:
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/writer.py in write_simple(fn, data, fmd, row_group_offsets, compression, open_with, has_nulls, append)
704 else None)
705 rg = make_row_group(f, data[start:end], fmd.schema,
--> 706 compression=compression)
707 if rg is not None:
708 fmd.row_groups.append(rg)
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/writer.py in make_row_group(f, data, schema, compression)
601 comp = compression
602 chunk = write_column(f, data[column.name], column,
--> 603 compression=comp)
604 rg.columns.append(chunk)
605 rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/writer.py in write_column(f, data, selement, compression)
541 data_page_header=dph, crc=None)
542
--> 543 write_thrift(f, ph)
544 f.write(bdata)
545
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/thrift_structures.py in write_thrift(fobj, thrift)
49 pout = TCompactProtocol(fobj)
50 try:
---> 51 thrift.write(pout)
52 fail = False
53 except TProtocolException as e:
~/miniconda3/envs/d3m_new_new/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py in write(self, oprot)
1084 def write(self, oprot):
1085 if oprot._fast_encode is not None and self.thrift_spec is not None:
-> 1086 oprot.trans.write(oprot._fast_encode(self, (self.__class__, self.thrift_spec)))
1087 return
1088 oprot.writeStructBegin('PageHeader')
TypeError: expecting list of size 2 for struct args
Are you using thrift version 0.10.0?
0.11.0
That does seem to be the issue. Installing 0.10.0 fixes it. Maybe update your requirements to force 0.10.0 exactly?
That was released on pypi at 2018-01-11, and is not available on conda yet. Would you mind trying with v0.10.0?
Yeah it works with 0.10.0
cc @mariusvniekerk
@bschreck , thanks for noticing and reporting this issue. It seems that after a fix, we had better release a new version fastparquet, hopefully soon after the new thrift conda package arrives.
Is there something simple I'm missing here? I'm just trying to do the most basic thing in the example:
Same error on my local Mac and remote EC2 ubuntu 16.04 instance