-
Several Databricks fields are a sort of map such as `custom_tags`, `default_tags`, `spark_conf`, `env_vars`, etc. These key-value pairs wind up in the bronze layer first as a struct where each key is …
-
#### What language are you using?
Python
#### What version of polars are you using?
0.13.21
#### What operating system are you using polars on?
Ubuntu 20.04.1 LTS
#### What language …
-
## Background [Optional]
We are trying to visualize lineage (metadata) of a dataframes produced by spark. For this we have created a spark job (below is the code).
## Question
We managed to read …
-
The solution provided by the team for the issue of not being able to execute other cells after cancelling out another cell (and having this one "frozen" for a long time) is not a solution good enough,…
-
I am using Databricks witha Delta Lake in the background. Databricks runtime is 10.2, with sparklyr 1.7.2.
```r
sc % filter(yearMonth >= 201801) %>% filter(yearMonth < 202202)
```
is much faste…
-
Spark 2.3 introduced a `repartitionByRange` option on dataframes. This could be used to improve the efficiency of `SortFullGroup` in the Parquet store (possibly avoiding the need to use RDDs, which co…
-
Using Spark version 3.2.1
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.14.1)
I load the xml files below. First one to establish the schema and the second with the actual insta…
-
## Background
Have all of the pieces set up, Spline web UI, ArangoDB.
Trying to run Spline with SBT/IntelliJ locally through a unit test and getting the below error:
20/07/13 20:20:11 ERROR Spark…
-
**What is your question?**
This is a question to the spark team of rapids. As part of cuIO refactor, We(rapids cudf team) are currently working on adding fuzz testing coverage for our avro reader…
-
Hi everyone
I programmed a processing of data on Jupyter Notebook (SageMaker) with the awswrangler library. This code work perfectly in this enviorement but when I try run it on Glue, the code fini…