Is your feature request related to a problem? Please describe.
Daft does not provide a native operation for global shuffle. The .sample method doesn't change the rows order, and .repartition keeps the original order of rows with respect to the original dataframe in each partition.
df_a = daft.from_pydict({"char": [f"a{i}" for i in range(10)]})
df_b = daft.from_pydict({"char": [f"b{i}" for i in range(10)]})
df = df_a.concat(df_b)
df.repartition(1).to_pydict()
Is your feature request related to a problem? Please describe. Daft does not provide a native operation for global shuffle. The
.sample
method doesn't change the rows order, and.repartition
keeps the original order of rows with respect to the original dataframe in each partition.Describe the solution you'd like After assigning each row a random partition, shuffle each partition. Add an optional way to control the seed.
Describe alternatives you've considered
.sample
)