dask / dask-expr

BSD 3-Clause "New" or "Revised" License
83 stars 22 forks source link

`DataFrame.head()` only returns rows from first partition of `dask.datasets.timeseries` #1013

Closed charlesbluca closed 5 months ago

charlesbluca commented 6 months ago

Describe the issue: Seems like head() only returns results from the first partition of dask.datasets.timeseries, even if npartitions is overridden.

Minimal Complete Verifiable Example:

from dask.datasets import timeseries

ddf = timeseries(freq="1d")
print(len(ddf)) 
# 30

print(ddf.head(30, npartitions=-1))
#              name    id         x         y
# timestamp                                  
# 2000-01-01  Edith  1061 -0.843924 -0.443468

Environment: