databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.33k stars 356 forks source link

Fix Series.koalas.transform_batch to support additional dtypes and reuse it. #2127

Closed ueshin closed 3 years ago

ueshin commented 3 years ago

Fix Series.koalas.transform_batch to support additional dtypes and reuse it in Series.transform and DataFrame.transform.

After this, additional dtypes can be specified in the return type annotation of the UDFs for Series.koalas.transform_batch, Series.transform, and DataFrame.transform.

>>> kdf = ks.DataFrame(
...     {"a": ["a", "b", "c", "a", "b", "c"], "b": ["b", "a", "c", "c", "b", "a"]}
... )
>>> dtype = pd.CategoricalDtype(categories=["a", "b", "c", "d"])
>>> def to_category(pser) -> ks.Series[dtype]:
...   return pser.astype(dtype)
...
>>> applied = kdf.a.koalas.transform_batch(to_category)
>>> applied
0    a
1    b
2    c
3    a
4    b
5    c
Name: a, dtype: category
Categories (4, object): ['a', 'b', 'c', 'd']
>>> applied.dtype
CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False)
codecov-io commented 3 years ago

Codecov Report

Merging #2127 (e051950) into master (6fae0cb) will decrease coverage by 0.03%. The diff coverage is 84.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2127      +/-   ##
==========================================
- Coverage   95.37%   95.34%   -0.04%     
==========================================
  Files          60       60              
  Lines       13606    13634      +28     
==========================================
+ Hits        12977    12999      +22     
- Misses        629      635       +6     
Impacted Files Coverage Δ
databricks/koalas/groupby.py 94.34% <0.00%> (ø)
databricks/koalas/accessors.py 91.98% <85.18%> (-1.52%) :arrow_down:
databricks/koalas/frame.py 96.49% <100.00%> (-0.06%) :arrow_down:
databricks/koalas/series.py 96.92% <100.00%> (-0.02%) :arrow_down:
databricks/koalas/missing/frame.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6fae0cb...e051950. Read the comment docs.

xinrong-meng commented 3 years ago

I may need more time to fully understand the code paths. However, code changes in this PR look great! Thank you!

ueshin commented 3 years ago

Thanks! merging.