Fix DataFrame.koalas.apply_batch to support additional dtypes.
After this, additional dtypes can be specified in the return type annotation of the UDFs for DataFrame.koalas.apply_batch.
>>> kdf = ks.DataFrame(
... {"a": ["a", "b", "c", "a", "b", "c"], "b": ["b", "a", "c", "c", "b", "a"]}
... )
>>> dtype = pd.CategoricalDtype(categories=["a", "b", "c", "d"])
>>> def to_category(pdf) -> ks.DataFrame["a": dtype, "b": dtype]:
... return pdf.astype(dtype)
...
>>> applied = kdf.koalas.apply_batch(to_category)
>>> applied
a b
0 a b
1 b a
2 c c
3 a c
4 b b
5 c a
>>> applied.dtypes
a category
b category
dtype: object
FYI: without the fix:
>>> applied
a b
0 0 1
1 1 0
2 2 2
3 0 2
4 1 1
5 2 0
>>> applied.dtypes
a int64
b int64
dtype: object
Fix
DataFrame.koalas.apply_batch
to support additional dtypes.After this, additional dtypes can be specified in the return type annotation of the UDFs for
DataFrame.koalas.apply_batch
.FYI: without the fix: