Open ChrisJar opened 2 years ago
This comes from partd. Solving this would also help enable groupby.apply
with cudf backed dask dataframes: https://github.com/rapidsai/cudf/issues/5755#issuecomment-976823896
This comes from partd. Solving this would also help enable
groupby.apply
with cudf backed dask dataframes: rapidsai/cudf#5755 (comment)
Partd no longer depends on the deprecated Pandas index class, but the reproducer still throws the same exception
Nice! Are you saying the traceback you're now seeing still references the removed internal pandas functionality? Do we possibly need to update a dependency pinning somewhere?
Oops, I missed a subtle difference in the trace:
File /opt/conda/envs/rapids/lib/python3.9/site-packages/partd/pandas.py:111, in index_to_header_bytes(ind)
108 cat = None
109 values = ind.values
--> 111 header = (type(ind), {k: getattr(ind, k, None) for k in ind._attributes}, values.dtype, cat)
112 bytes = pnp.compress(pnp.serialize(values), values.dtype)
113 return header, bytes
AttributeError: 'Int64Index' object has no attribute '_attributes'
It looks like _attributes
is also gone from Pandas indexes?
Do we possibly need to update a pinning somewhere
I don't think so- the latest version in pypi is 1.3.0 which is what I have in my env.
What happened: Joining tables backed by dask_cudf dataframes with multiple partitions causes the error
AttributeError: 'Int64Index' object has no attribute '_get_attributes_dict'
to be thrownMinimal Complete Verifiable Example:
throws
Traceback
However, when the same query is performed with CPU dataframes:
Or GPU dataframes with a single partition:
the result is:
Environment: