databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

Fix DataFrame.koalas.transform_batch to support additional dtypes. #2132

Closed ueshin closed 3 years ago

ueshin commented 3 years ago

Fix DataFrame.koalas.transform_batch to support additional dtypes.

After this, additional dtypes can be specified in the return type annotation of the UDFs for DataFrame.koalas.transform_batch.

>>> kdf = ks.DataFrame(
...     {"a": ["a", "b", "c", "a", "b", "c"], "b": ["b", "a", "c", "c", "b", "a"]}
... )
>>> dtype = pd.CategoricalDtype(categories=["a", "b", "c", "d"])
>>> def to_category(pdf) -> ks.DataFrame["a":dtype, "b":dtype]:
...   return pdf.astype(dtype)
...
>>> applied = kdf.koalas.transform_batch(to_category)
>>> applied
   a  b
0  a  b
1  b  a
2  c  c
3  a  c
4  b  b
5  c  a
>>> applied.dtypes
a    category
b    category
dtype: object
xinrong-meng commented 3 years ago

Looks great, thank you!

ueshin commented 3 years ago

Thanks! merging.