-
Merging data frames from multiple non-matching partitions of each data frame creates a lot of “shuffles” that may aggregate processing into one GPU vs distributed uniformly, and may spills to CPU or d…
-
Hello,
We have a setup where we process data incrementally against large Hudi tables in S3, using Hudi and Spark. When reading large tables from a different spark process or when applying time cons…
-
Ever since upgrading to the latest version of dbplyr, our code output has been riddled with the warning
```
1: ORDER BY is ignored in subqueries without LIMIT
ℹ Do you need to move arrange() late…
-
Hello,
I am trying to apply a UDF after using the collect_list function. Here is a reproducible code:
```r
tab %
ungroup()
udf %
collect()
```
Here is the callstack from one of the ex…
-
Can we invoke .net for apache spark from .net core web api? my request is to have a simple web page which has the file upload button to upload the file and submit. By submitting, the application shoul…
-
Hi,
I have been trying to connect Spark 3.0 with Neo4j 4.1. However, the connector doesn't seem to work, it's throwing quite a lot of errors. Before sharing the specific errors I have I was just c…
-
## Description
Not essential feature since the actual framework seems to be complete, but a bonus
Create Stores based on dask dataframes (https://dask.org/) or delayed dask functions
…
ghost updated
3 years ago
-
**Describe the problem you faced**
Schema validation using:
```
hoodie.avro.schema.validate=true
```
always fails due to mismatched namespaces when writing using Deltasteamer with `RowBased…
-
After updating to SparklyR 7.0 sdf_copy_to seems to run forever without failure. This seems to be working fine in version 6.4. Have you had any experience with this?
Syntax: sdf_copy_to(sc,objectna…
-
I am trying to create a DataFrame of random values. I can do this from scala with
```scala
org.apache.spark.mllib.random.RandomRDDs.uniformRDD(sc, 10).toDF().show()
// +-------------------+
// |…