-
Dplyr offers the argument `order_by` for the `lead` and `lag` functions.
Sparklyr does not seem to use this argument but instead requires that the table is ordered before (by using `arrange` before …
-
It would be nice if sparklyr was able to connect to an existing Spark session similar to SparkR.
-
The spark_tbl does not provide a summarised result when used without a group_by(). Kindly refer to the issue raised in tidyverse.
https://github.com/tidyverse/dplyr/issues/5613
-
Notice the column `z` has type `chr`, not the expected `numeric`.
I have also filed related [`dbplyr` issue](https://github.com/tidyverse/dplyr/issues/3081): but I don't know how many data sources …
-
---
Say `sparkD` is my R object that connects to a Spark Data frame. In earlier versions of sparklyr, `sparkD + ` would print a `head` of the values of sparkD, which is practical for interactive use…
-
The default `dplyr::left_join()` suffix values are deadly to `sparklyr`. Notice below the suffix dots start looking like qualifiers and ruin the join.
```r
library("dplyr")
## Warning: pac…
-
Hello everyone, I'm applying a custom-made R function on a Spark dataframe using ```spark_apply``` but the issue that I'm facing is that the main function uses several other custom-made functions impl…
-
`sparklyr::sdf_seq()` will not work with the default `repartition` parameter. This is because it gets passed `type` which by default is one of `"integer"` or `"integer64"`, a character string.
```r…
-
We encounter an embedded nul issue while applying a method to parquet data.
```
source('spark_function.R')
library(tictoc)
library(arrow)
download.file('https://github.com/akashshah59/embedd…
-
Does `sparklyr` need a function which is analogous to `doParallel::clusterEvalQ`?
I am trying to parallelize some database queries, where each worker would pull a single day's worth of data from th…