Closed danzafar closed 4 years ago
OK physical plan are pretty different making me thing the window is not being applied:
> spark_tbl(iris) %>%
+ group_by(Species) %>%
+ filter(Petal_Length < max(Sepal_Length)) %>%
+ explain
== Physical Plan ==
*(2) Project [Sepal_Length#374, Sepal_Width#375, Petal_Length#376, Petal_Width#377, Species#378]
+- *(2) Filter ((isnotnull(Petal_Length#376) && isnotnull(agg_col0#386)) && (Petal_Length#376 < agg_col0#386))
+- Window [max(Sepal_Length#374) windowspecdefinition(Species#378, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS agg_col0#386], [Species#378]
+- *(1) Sort [Species#378 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(Species#378, 200)
+- Scan ExistingRDD[Sepal_Length#374,Sepal_Width#375,Petal_Length#376,Petal_Width#377,Species#378]
> spark_tbl(iris) %>%
+ group_by(Species) %>%
+ filter(Petal_Length == max(Sepal_Length)) %>%
+ explain
== Physical Plan ==
*(1) Filter (isnotnull(Petal_Length#406) && (Petal_Length#406 = max(Sepal_Length#404)))
+- Scan ExistingRDD[Sepal_Length#404,Sepal_Width#405,Petal_Length#406,Petal_Width#407,Species#408]
I'm seeing this error crop up in a few aggregated filter operations (see #17 and #18). Starting this thread to investigate directly. It seems this command works:
but this command does not
with error:
So this is an internal Spark issue. Let's figure out what exactly is going on here.