-
Let's say I want to choose which type of average function I want to use dependent on the standard deviation of a column. This can be easily achieved with a regular `data.frame`:
```r
mtcars %>% gr…
-
Do you plan realize something like ml_xgboost?
(based on https://libraries.io/github/dmlc/xgboost)
-
-
The spark_tbl does not provide a summarised result when used without a group_by().
```r
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':…
-
## Current Behavior
`conda uninstall r`
doesn't uninstall all the r packages I see when I run `conda list`
```
r-assertthat 0.2.0 mro343h889e2dd_0 r
r-backports …
-
When using `spark_apply`, `NA` values in character vectors or factor vectors are lost. When looking closer they appear to be treated as `"NA"`.
Results using `dplyr`:
```r
data
ghost updated
3 years ago
-
https://databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html and https://github.com/CODAIT/spark-bench (for data generation) may be good starting points
Also, alth…
-
Collecting 1M+ records several times triggers an exception in some environments; worth investigating this further at some point.
```
pickup_dropoff
-
`knitr::kable()` seems to mis-order column names or column values with `sparklyr`. Notice the difference in directly piping the `sparklyr` table to `knitr::kable()` versus piping the `dplyr::collect(…
-
User reported a difference of 5min and seconds between running `sparklyr_tbl %>% head()` with and without `tibble`. The problem is that `tibble` runs `nrow()` as part of [trunc_mat](https://github.com…