Open fjetter opened 1 year ago
When there are object columns with mixed data types, the arrow backend cannot handle this.
The internal error that is raised in this example is
ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column mixed_stuff with type object')
while the user receives a generic shuffle failed exception
RuntimeError: P2P shuffling [id] failed during transfer phase
Reproducing code example
import pandas as pd import dask.dataframe as dd import numpy as np from distributed import Client with Client() as client: df = pd.DataFrame({ "mixed_stuff": [{"foo": "bar"}, np.array((3,))] * 2, "int": [1, 2] * 2, }) ddf = dd.from_pandas(df, npartitions=2) ddf.shuffle(on="int").compute()
were you able to find a solution to the same?
When there are object columns with mixed data types, the arrow backend cannot handle this.
The internal error that is raised in this example is
while the user receives a generic shuffle failed exception
Reproducing code example