dask / dask-expr

BSD 3-Clause "New" or "Revised" License
79 stars 18 forks source link

Running `groupby` with multiple keys raises a `TypeError` #1076

Closed hoxbro closed 2 weeks ago

hoxbro commented 4 weeks ago

Describe the issue:

Running groupby with multiple keys raises a TypeError. Works with classic dask-dataframe.

Minimal Complete Verifiable Example:

import dask
import pandas as pd
import numpy as np

dask.config.set({"dataframe.query_planning": True})

import dask.dataframe as dd

data = list("AB" * 10), np.arange(20) % 3, np.arange(20)
df = pd.DataFrame(dict(zip("xyz", data)))
ddf = dd.from_pandas(df, npartitions=2)
ddf.groupby(["x", "y"]).get_group(('A', 0))

Anything else we need to know?:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 12
     10 df = pd.DataFrame(dict(zip("xyz", data)))
     11 ddf = dd.from_pandas(df, npartitions=2)
---> 12 ddf.groupby(["x", "y"]).get_group(('A', 0))

File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_groupby.py:1633](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_groupby.py#line=1632), in GroupBy.get_group(self, key)
   1628 @derived_from(
   1629     pd.core.groupby.GroupBy,
   1630     inconsistencies="If the group is not present, Dask will return an empty Series[/DataFrame.](http://localhost:8888/DataFrame.)",
   1631 )
   1632 def get_group(self, key):
-> 1633     return new_collection(GetGroup(self.obj.expr, key, self._slice, *self.by))

File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_collection.py:4764](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_collection.py#line=4763), in new_collection(expr)
   4762 def new_collection(expr):
   4763     """Create new collection from an expr"""
-> 4764     meta = expr._meta
   4765     expr._name  # Ensure backend is imported
   4766     return get_collection_type(meta)(expr)

File [~/miniconda3/envs/holoviz/lib/python3.12/functools.py:995](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/functools.py#line=994), in cached_property.__get__(self, instance, owner)
    993 val = cache.get(self.attrname, _NOT_FOUND)
    994 if val is _NOT_FOUND:
--> 995     val = self.func(instance)
    996     try:
    997         cache[self.attrname] = val

File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_expr.py:496](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask_expr/_expr.py#line=495), in Blockwise._meta(self)
    493 @functools.cached_property
    494 def _meta(self):
    495     args = [op._meta if isinstance(op, Expr) else op for op in self._args]
--> 496     return self.operation(*args, **self._kwargs)

TypeError: _groupby_get_group() got multiple values for argument 'get_key'

Environment: