Closed martindurant closed 1 year ago
@jrbourbeau , I'll merge this when it passes, and that should be enough to make dask CI happy.
Thanks for fixing so quickly @martindurant!
Will there be a release out with this patch soon? We use releases in most CI build (one build uses main
for fastparquet
). If not, I'll just add some skip logic
Will there be a release out with this patch soon
Yes, since the windows-py3.12 wheel failed to build in the last round anyway.
@jrbourbeau , would you mind running your main-branch CI somewhere to see if the failures go away?
Locally I'm getting the same error
____________________________________________________________________________________________________________________________________ test_timestamp96 _____________________________________________________________________________________________________________________________________
tmpdir = local('/private/var/folders/h0/_w6tz8jd3b9bk6w7d_xpg9t40000gn/T/pytest-of-james/pytest-21/test_timestamp960')
@FASTPARQUET_MARK
def test_timestamp96(tmpdir):
fn = str(tmpdir)
df = pd.DataFrame({"a": [pd.to_datetime("now", utc=True)]})
ddf = dd.from_pandas(df, 1)
ddf.to_parquet(fn, engine="fastparquet", write_index=False, times="int96")
pf = fastparquet.ParquetFile(fn)
assert pf._schema[1].type == fastparquet.parquet_thrift.Type.INT96
> out = dd.read_parquet(fn, engine="fastparquet", index=False).compute()
dask/dataframe/io/tests/test_parquet.py:1883:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/base.py:342: in compute
(result,) = compute(self, traverse=False, **kwargs)
dask/base.py:628: in compute
results = schedule(dsk, keys, **kwargs)
dask/dataframe/io/parquet/core.py:96: in __call__
return read_parquet_part(
dask/dataframe/io/parquet/core.py:654: in read_parquet_part
dfs = [
dask/dataframe/io/parquet/core.py:655: in <listcomp>
func(
dask/dataframe/io/parquet/fastparquet.py:1075: in read_partition
return cls.pf_to_pandas(
dask/dataframe/io/parquet/fastparquet.py:1115: in pf_to_pandas
df, views = pf.pre_allocate(size, columns, categories, index)
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:797: in pre_allocate
df, arrs = _pre_allocate(size, columns, categories, index, cats,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:1051: in _pre_allocate
df, views = dataframe.empty(dtypes, size, cols=cols, index_names=index,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/dataframe.py:202: in empty
values = type(bvalues)._from_sequence(values, copy=False, dtype=bvalues.dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pandas/_libs/tslibs/tzconversion.pyx:187: ValueError
Note it looks like the line changed in this PR is similar, but not exactly the same, to the line where the error is being raised. Maybe both lines need the same sort of update
What's your pandas version?
In [1]: import pandas as pd
pd
In [2]: pd.__version__
Out[2]: '1.5.3'
OK, then I think all the pandas I have and in tests are too new... Hold on.
Fixes #897