danzafar / tidyspark

tidyspark: a tidyverse implementation of SparkR built for simplicity, elegance, and ease of use.
Other
22 stars 0 forks source link

Fix lower unboundedPreceding windows #72

Open danzafar opened 4 years ago

danzafar commented 4 years ago

In scala you define unboundedPreceding windows by using the minimum long value, which is -9223372036854775808L. But in SparkR, integer values are used to define long types, but the lowest integer value is only -2147483647L. Because of this, there is no way to define Window.unboundedPreceding.

For example, here is the function for cumsum:

#' @export
cumsum.Column <- function(x) {
  wndw <- call_static("org.apache.spark.sql.expressions.Window",
                      "orderBy", list(x@jc))
  jc <- call_static("org.apache.spark.sql.functions", "sum", x@jc)
  new("Column",
      call_method(jc, "over",
        call_method(wndw, "rowsBetween", -2147483647L, 0L)))
}

This works unless the window needs more than 2.14B rows, then it will break down silently. To fix will require a commit to the RBackendHandler scala code to make a special exception for when R's minimum integer value, or another placeholder, is used.