Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
168
stars
42
forks
source link
the implementation of BasicStatisticsComputer.GetSummaryMapper is inefficient #38
Open
Huandao0812 opened 10 years ago
getSummaryImpl should use RDD[Row] to compute summary, also the if else loop in https://github.com/ddf-project/DDF/blob/master/spark/src/main/java/io/spark/ddf/analytics/BasicStatisticsComputer.java#L54 is inefficient and unneccessary, as we already have the column type, don't need to manually check by instanceof