Closed ueshin closed 3 years ago
Fix groupby-apply and transform to support additional dtypes.
After this, additional dtypes can be specified in the return type annotation of the UDFs for groupby-apply and transform.
>>> kdf = ks.DataFrame( ... { ... "a": pd.Categorical([1, 2, 3, 1, 2, 3]), ... "b": pd.Categorical( ... ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"] ... ), ... }, ... ) >>> def identity(df) -> ks.DataFrame[zip(kdf.columns, kdf.dtypes)]: ... return df ... >>> applied = kdf.groupby("a").apply(identity) >>> applied a b 0 2 a 1 2 b 2 3 c 3 3 a 4 1 b 5 1 c >>> applied.dtypes a category b category dtype: object
FYI: without the fix:
>>> applied a b 0 1 3 1 1 1 2 2 0 3 2 3 4 0 1 5 0 0 >>> applied.dtypes a int64 b int64 dtype: object
Looks great! Thank you!
Thanks! merging.
Fix groupby-apply and transform to support additional dtypes.
After this, additional dtypes can be specified in the return type annotation of the UDFs for groupby-apply and transform.
FYI: without the fix: