databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.34k stars 358 forks source link

Some arithmetic operators for IndexOpsMixin don't work with magic method. #1235

Open itholic opened 4 years ago

itholic commented 4 years ago

Assuming that we have Series like the below.

>>> pser = pd.Series([1, 2, 3])
>>> kser = ks.from_pandas(pser)

and there is an issue for some cases with arithmetic ops like the below.

>>> pser.__rfloordiv__(pser)  # it's okay for pandas
0    1
1    1
2    1
Name: 0, dtype: int64

>>> kser.__rfloordiv__(kser)  # but doesn't work for koalas
Traceback (most recent call last):
...
TypeError: Column is not iterable
>>> pser.__rsub__(pser)
0    0
1    0
2    0
Name: 0, dtype: int64

>>> kser.__rsub__(kser)
Traceback (most recent call last):
...
TypeError: Column is not iterable

I think there are more cases have same issue.

itholic commented 4 years ago

i'm working on this to solve.

HyukjinKwon commented 4 years ago

Does it throw the same exception when we perform arithmetic operators (e.g., a - b vs b - a)?

itholic commented 4 years ago

They're working well with arithmetic operators like you exampled.

they raise exception when only in magic method with type of Series or Index like the below.

>>> kser.__rfloordiv__(100)
0    100
1     50
2     33
Name: 0, dtype: int64

>>> kser.__rfloordiv__('hello')
0    None
1    None
2    None
Name: 0, dtype: object

>>> kser.__rfloordiv__(kser)
Traceback (most recent call last):
...
TypeError: Column is not iterable
>>> kidx.__rfloordiv__(100)
Int64Index([100, 50, 33], dtype='int64')

>>> kidx.__rfloordiv__('hello')
Index([None, None, None], dtype='object')

>>> kidx.__rfloordiv__(kidx)
Traceback (most recent call last):
...
TypeError: Column is not iterable

any idea ?

HyukjinKwon commented 4 years ago

It might be better to fix anyway but I was wondering if it causes a real issue because these dunder methods are not supposed to called directly.

itholic commented 4 years ago

yes right. i just found them very coincidently, and also think it's not so have high priority to solve for now.