databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.34k stars 358 forks source link

subset parameter in DataFrame.replace #1516

Closed beobest2 closed 4 years ago

beobest2 commented 4 years ago
>>> kdf.replace('Mjolnir', 'Stormbuster', subset=('weapon',))
                     name       weapon
0.342778          Ironman      Mark-45
0.087444  Captain America       Shield
0.179212             Thor  Stormbuster
0.522174             Hulk        Smash

>>> pdf.replace('Mjolnir', 'Stormbuster', subset=('weapon',))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: replace() got an unexpected keyword argument 'subset'

@HyukjinKwon When I tried adding the above test to the test case, I found that pandas does not support the subset parameter. So when I looked into the old version, pandas didn't support the subset parameter from the beginning. I found that the current replace parameter matches the spark replace. So, what are your thoughts on deleting a subset from Koalas for pandas?

ref>

pandas 0.9.0 DataFrame.replace

pandas 1.0.1 DataFrame.replace

pyspark 2.1.3 DataFrame.replace

_Originally posted by @beobest2 in https://github.com/_render_node/MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDQyNzAxNjQ0Mw==/comments/review_comment_

HyukjinKwon commented 4 years ago

Seems like that parameter was added to address https://github.com/databricks/koalas/pull/495#issuecomment-505129634 issue. The current DataFrame.replace seems incomplete.

Can we support dict and list in to_replace? Then I think we can remove it.

beobest2 commented 4 years ago

I will check if to_replace support dict and list

beobest2 commented 4 years ago

I have checked whether Koalas support dict and list in to_replace It works fine. May I delete subset parameter?

>>> pdf
   A  B  C
0  0  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e
>>> kdf
   A  B  C
0  0  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

>>> pdf.replace({'A': 0, 'B': 6}, 100)
     A    B  C
0  100    5  a
1    1  100  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> kdf.replace({'A': 0, 'B': 6}, 100)
     A    B  C
0  100    5  a
1    1  100  b
2    2    7  c
3    3    8  d
4    4    9  e

>>> pdf.replace({0: 100, 1: 1000})
      A  B  C
0   100  5  a
1  1000  6  b
2     2  7  c
3     3  8  d
4     4  9  e
>>> kdf.replace({0: 100, 1: 1000})
      A  B  C
0   100  5  a
1  1000  6  b
2     2  7  c
3     3  8  d
4     4  9  e

>>> pdf.replace([0, 1, 4], 100)
     A  B  C
0  100  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4  100  9  e
>>> kdf.replace([0, 1, 4], 100)
     A  B  C
0  100  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4  100  9  e
HyukjinKwon commented 4 years ago

Yep, please go ahead