RevolutionAnalytics / dplyr-spark

spark backend for dplyr
48 stars 18 forks source link

`as.numeric` in sparkSQL #32

Open wush978 opened 8 years ago

wush978 commented 8 years ago

Under dplyr v 0.4.3, the as.numeric might fail.

After executing the following R script:

dplyr::group_by(df, day) %>%
  dplyr::summarise(imp = count(adid), clk = mean(as.numeric(is_click))) %>%
  dplyr::collect()

R will raise an error from spark: (org.apache.spark.sql.AnalysisException: cannot recognize input near 'numeric' ')' ')' in primitive type specification;

I fixed this issue according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-NumericTypes in https://github.com/bridgewell/dplyrSparkSQL/commit/781fba15b034686d9637e79378a90686434f63ef . It seems that this package does not add these customized translator (https://github.com/RevolutionAnalytics/dplyr-spark/blob/e073c607970ce8a44088e3fa99c52a9cab7163e5/pkg/R/src-sparkSQL.R#L114)

Hope this helps.