the implementation of BasicStatisticsComputer.GetSummaryMapper is inefficient

ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine

http://ddf.io

Apache License 2.0

168 stars 42 forks source link

Open Huandao0812 opened 10 years ago

Huandao0812 commented 10 years ago

getSummaryImpl should use RDD[Row] to compute summary, also the if else loop in https://github.com/ddf-project/DDF/blob/master/spark/src/main/java/io/spark/ddf/analytics/BasicStatisticsComputer.java#L54 is inefficient and unneccessary, as we already have the column type, don't need to manually check by instanceof

ctn commented 9 years ago

@Huandao0812 is this still an issue?