databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

[FIX] _builtin_table import in groupby apply (changed in pandas>=1.3.0) #2184

Closed Cedric-Magnan closed 2 years ago

Cedric-Magnan commented 2 years ago

I have recently had an issue while upgrading pandas to the latest version in my Databricks environment :

AttributeError: type object 'SelectionMixin' has no attribute '_builtin_table'

Pandas has recently refactored the way we import the _builtin_table and is now part of the pandas.core.common module instead of being an attribute of the pandas.core.base.SelectionMixin class.

PR that was merged in the 1.3.0 : https://github.com/pandas-dev/pandas/pull/40857

This suggestion should solve the issue !

Cedric-Magnan commented 2 years ago

I have recently had an issue while upgrading pandas to the latest version in my Databricks environment :

AttributeError: type object 'SelectionMixin' has no attribute '_builtin_table'

Pandas has recently refactored the way we import the _builtin_table and is now part of the pandas.core.common module instead of being an attribute of the pandas.core.base.SelectionMixin class.

PR that was merged in the 1.3.0 : pandas-dev/pandas#40857

This suggestion should solve the issue !

I've just seen that this issue was solved 6 days ago in the python/pyspark/pandas/groupby.py module from the spark repository : https://github.com/apache/spark/pull/33598/files

HyukjinKwon commented 2 years ago

@Cedric-Magnan would you mind bringing the changes to this PR?

Cedric-Magnan commented 2 years ago

@Cedric-Magnan would you mind bringing the changes to this PR?

@HyukjinKwon Comments from the pyspark repository have been applied to this PR in the latest commit. Don't hesitate to tell me when you think the fix can be merged.

HyukjinKwon commented 2 years ago

cc @ueshin

codecov-commenter commented 2 years ago

Codecov Report

Merging #2184 (110acf5) into master (46c80e6) will decrease coverage by 0.19%. The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2184      +/-   ##
==========================================
- Coverage   95.36%   95.16%   -0.20%     
==========================================
  Files          60       60              
  Lines       13707    13746      +39     
==========================================
+ Hits        13071    13081      +10     
- Misses        636      665      +29     
Impacted Files Coverage Δ
databricks/koalas/groupby.py 93.95% <81.81%> (-0.40%) :arrow_down:
databricks/koalas/__init__.py 86.58% <0.00%> (-4.45%) :arrow_down:
databricks/koalas/typedef/typehints.py 92.46% <0.00%> (-2.95%) :arrow_down:
databricks/conftest.py 98.43% <0.00%> (-1.57%) :arrow_down:
databricks/koalas/plot/plotly.py 95.95% <0.00%> (-0.89%) :arrow_down:
databricks/koalas/plot/matplotlib.py 91.63% <0.00%> (-0.37%) :arrow_down:
databricks/koalas/indexes/multi.py 93.10% <0.00%> (-0.35%) :arrow_down:
databricks/koalas/tests/indexes/test_base.py 99.70% <0.00%> (-0.30%) :arrow_down:
databricks/koalas/generic.py 93.12% <0.00%> (-0.21%) :arrow_down:
databricks/koalas/namespace.py 84.55% <0.00%> (-0.20%) :arrow_down:
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 46c80e6...110acf5. Read the comment docs.

HyukjinKwon commented 2 years ago

Merged. Thanks @Cedric-Magnan.