databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.33k stars 356 forks source link

Fix DataFrame.apply to support additional dtypes. #2125

Closed ueshin closed 3 years ago

ueshin commented 3 years ago

Fix DataFrame.apply to support additional dtypes.

After this, additional dtypes can be specified in the return type annotation of the UDFs for DataFrame.apply.

>>> kdf = ks.DataFrame(
...     {"a": ["a", "b", "c", "a", "b", "c"], "b": ["b", "a", "c", "c", "b", "a"]}
... )
>>> dtype = pd.CategoricalDtype(categories=["a", "b", "c"])
>>> def categorize(ser) -> ks.Series[dtype]:
...     return ser.astype(dtype)
...
>>> applied = kdf.apply(categorize)
>>> applied
   a  b
0  a  b
1  b  a
2  c  c
3  a  c
4  b  b
5  c  a
>>> applied.dtypes
a    category
b    category

FYI: without the fix:

>>> applied
   a  b
0  0  1
1  1  0
2  2  2
3  0  2
4  1  1
5  2  0
>>> applied.dtypes
a    int64
b    int64
dtype: object
codecov-io commented 3 years ago

Codecov Report

Merging #2125 (5446ec7) into master (c9e4791) will decrease coverage by 1.21%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2125      +/-   ##
==========================================
- Coverage   95.37%   94.15%   -1.22%     
==========================================
  Files          60       60              
  Lines       13591    13477     -114     
==========================================
- Hits        12962    12689     -273     
- Misses        629      788     +159     
Impacted Files Coverage Δ
databricks/koalas/frame.py 96.52% <100.00%> (-0.01%) :arrow_down:
databricks/koalas/usage_logging/__init__.py 28.20% <0.00%> (-64.36%) :arrow_down:
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) :arrow_down:
databricks/conftest.py 90.90% <0.00%> (-9.10%) :arrow_down:
databricks/koalas/__init__.py 84.21% <0.00%> (-7.90%) :arrow_down:
databricks/koalas/typedef/typehints.py 89.28% <0.00%> (-6.13%) :arrow_down:
databricks/koalas/tests/indexes/test_base.py 97.15% <0.00%> (-2.85%) :arrow_down:
databricks/koalas/testing/utils.py 78.53% <0.00%> (-2.53%) :arrow_down:
databricks/koalas/tests/indexes/test_datetime.py 97.70% <0.00%> (-2.30%) :arrow_down:
databricks/koalas/tests/indexes/test_category.py 98.21% <0.00%> (-1.79%) :arrow_down:
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c9e4791...5446ec7. Read the comment docs.

xinrong-meng commented 3 years ago

LGTM, thank you!

ueshin commented 3 years ago

Thanks! merging.