Closed nitinmnsn closed 2 years ago
Let me close since it's duplicated with https://github.com/databricks/koalas/issues/2201.
And this is technically a bug in the pandas API on Spark (pyspark.pandas), not in Koalas, so it should be filed in Apache Spark JIRA.
Of course, Koalas and pandas on Spark have almost the same behavior, but they should be treated as different projects as pandas on Spark is much more actively updated now.
pyspark pandas groupby aggregate function API depends upon whether the dataframe is
pyspark.sql.dataframe.DataFrame
orpyspark.pandas.frame.DataFrame
. Is this intended behaviour? Also, How do I run groupby .agg if the dataframe ispyspark.pandas.frame.DataFrame
? Seems like registering pandas_udf is necessary to run them.Output: ValueError: aggs must be a dict mapping from column name to aggregate functions (string or list of strings).
However, if i create the dataframe without using pandas from pyspark the exact same code works without any errors