RevolutionAnalytics / dplyr-spark

spark backend for dplyr
48 stars 18 forks source link

problem with queries where new col is used in WHERE clause #5

Closed piccolbo closed 9 years ago

piccolbo commented 9 years ago

they are translated to

SELECT ..., newcol WHERE newcol > 2

Works in sqlite, but not here. Triggered by filter(mutate(...

piccolbo commented 9 years ago

problem is in transmute_, reported to dplyr team see hadley/dplyr#1158

piccolbo commented 9 years ago

I don't think this is the same problem anymore as dplyr#1158 By the way, we have a patch for that in dplyr.spark and it's not even executed when doing a filter(mutate ...

piccolbo commented 9 years ago

filter(mutate works in sql lite as is. works in spark sql replacing where clause with having. works in R putting a collapse between mutate and filter

piccolbo commented 9 years ago

asked for advice here https://github.com/hadley/dplyr/issues/1319