If we do aggregations on data frames, we should avoid using aggregators that require an RDD (df.rdd.aggregate(...)). If possible, we should replace all these patterns by UDAFs, which will simplify the code a lot because we don't have to work with Row objects anymore.
Description
If we do aggregations on data frames, we should avoid using aggregators that require an RDD (
df.rdd.aggregate(...)
). If possible, we should replace all these patterns by UDAFs, which will simplify the code a lot because we don't have to work with Row objects anymore.Prerequisites