databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

Feature Request for using lambda inside of .loc[] #2200

Open kylegilde opened 2 years ago

kylegilde commented 2 years ago

Hello!

Would you consider adding the ability to self-reference DataFrames and Series using the lambda function inside of the .loc[] method? I think it's one of the most convenient features in Pandas.

Thank you


import panda as pd
my_df = pd.DataFrame(
    {
        "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
        "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
        "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
        "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
        "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
        "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
    }
)
print(my_df.loc[lambda df: (df.a == 3) | (df.b == 'x')])
   a  b     c    d    e     f
0  1  x  True    h 10.0   NaN
2  3  z   NaN  NaN 20.0 200.0
print(my_df.a.loc[lambda s: s < 2])
0    1
Name: a, dtype: int32