databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

I cant use .shift() on columns that hold lists as values. #2207

Open amorimds opened 2 years ago

amorimds commented 2 years ago
kdf = ks.DataFrame({'A': [1, 1, 2, 2], 'B': [[1, 1, 2, 2], [1, 1, 2, 2], [1, 1, 2, 2], [1, 1, 2, 2]]}, columns=['A', 'B'])
kdf.groupby('A')['B'].shift(1)
amorimds commented 2 years ago

I could go around this problem with:

kdf = ks.DataFrame({'A': [1, 1, 2, 2], 'B': [[1, 1, 2, 2], [1, 1, 2, 2], [1, 1, 2, 2], [1, 1, 2, 2]]}, columns=['A', 'B'])
kdf['B'] = kdf['B'].astype(str)
kdf.groupby('A')['B'].shift(1).apply(lambda x: eval(str(x)))
itholic commented 2 years ago

Thanks for the report, @amorimds .

And currently the Koalas project is only in maintaining mode, so the response could be quite delayed.

The Koalas project is currently being managed more actively in PySpark under the name of "pandas API on Spark" (you can simply re-use the existing Koalas code by importing import pyspark.pandas as ks)

So if you're going to continue using Koalas, I recommend using PySpark! (You can get a quicker response if you report the issue to the Apache Spark JIRA)