ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
http://ddf.io
Apache License 2.0
168 stars 42 forks source link

the implementation of BasicStatisticsComputer.GetSummaryMapper is inefficient #38

Open Huandao0812 opened 10 years ago

Huandao0812 commented 10 years ago

getSummaryImpl should use RDD[Row] to compute summary, also the if else loop in https://github.com/ddf-project/DDF/blob/master/spark/src/main/java/io/spark/ddf/analytics/BasicStatisticsComputer.java#L54 is inefficient and unneccessary, as we already have the column type, don't need to manually check by instanceof

ctn commented 9 years ago

@Huandao0812 is this still an issue?