Closed hoxbro closed 2 weeks ago
Describe the issue:
Running groupby with multiple keys raises a TypeError. Works with classic dask-dataframe.
groupby
TypeError
Minimal Complete Verifiable Example:
import dask import pandas as pd import numpy as np dask.config.set({"dataframe.query_planning": True}) import dask.dataframe as dd data = list("AB" * 10), np.arange(20) % 3, np.arange(20) df = pd.DataFrame(dict(zip("xyz", data))) ddf = dd.from_pandas(df, npartitions=2) ddf.groupby(["x", "y"]).get_group(('A', 0))
Anything else we need to know?:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[4], line 12 10 df = pd.DataFrame(dict(zip("xyz", data))) 11 ddf = dd.from_pandas(df, npartitions=2) ---> 12 ddf.groupby(["x", "y"]).get_group(('A', 0)) File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_groupby.py:1633](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_groupby.py#line=1632), in GroupBy.get_group(self, key) 1628 @derived_from( 1629 pd.core.groupby.GroupBy, 1630 inconsistencies="If the group is not present, Dask will return an empty Series[/DataFrame.](http://localhost:8888/DataFrame.)", 1631 ) 1632 def get_group(self, key): -> 1633 return new_collection(GetGroup(self.obj.expr, key, self._slice, *self.by)) File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_collection.py:4764](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_collection.py#line=4763), in new_collection(expr) 4762 def new_collection(expr): 4763 """Create new collection from an expr""" -> 4764 meta = expr._meta 4765 expr._name # Ensure backend is imported 4766 return get_collection_type(meta)(expr) File [~/miniconda3/envs/holoviz/lib/python3.12/functools.py:995](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/functools.py#line=994), in cached_property.__get__(self, instance, owner) 993 val = cache.get(self.attrname, _NOT_FOUND) 994 if val is _NOT_FOUND: --> 995 val = self.func(instance) 996 try: 997 cache[self.attrname] = val File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_expr.py:496](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_expr.py#line=495), in Blockwise._meta(self) 493 @functools.cached_property 494 def _meta(self): 495 args = [op._meta if isinstance(op, Expr) else op for op in self._args] --> 496 return self.operation(*args, **self._kwargs) TypeError: _groupby_get_group() got multiple values for argument 'get_key'
Environment:
2024.5.2
3.12
Linux
conda
Describe the issue:
Running
groupby
with multiple keys raises aTypeError
. Works with classic dask-dataframe.Minimal Complete Verifiable Example:
Anything else we need to know?:
Environment:
2024.5.2
3.12
Linux
conda