databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

Groupby apply with as_index=False #2133

Open kismsu opened 3 years ago

kismsu commented 3 years ago

Hi, is there any reason why running apply on grouped DataFrame with as_index=False does not return group keys as columns?

I see a great benefit in creating generic apply function with base schema like

def my_func(df: pd.DataFrame): -> ks.DataFrame["value": float]:
   return pd.DataFrane({"value": df["x"].value[0] + 1})

That I can apply with different group by keys

df = ks.DataFrame({'A':[ 'a', 'a', 'b'], 'B': [1, 2, 3], 'C': [4, 6, 5]}, columns=['A', 'B', 'C'])

ver1 = df.groupby("A", as_index=False).apply(my_func)
ver2 = df.groupby("B", as_index=False).apply(my_func)

and get A column with output for ver1 and B column for ver2

xinrong-meng commented 3 years ago

Hi @kismsu, would you please show a reproducible example to help us understand your question?

HyukjinKwon commented 3 years ago

It's because of missing type hint support for index. This is a limitation for now. @kismsu you can simply omit the type hints and let it infers properly for now.