-
### Topic Suggestion
Creating a PySpark DataFrame: A Beginner's Guide
#### Proposed article introduction
We can distribute data and conduct calculations on several nodes of a cluster using Spark, a…
-
**Is your feature request related to a problem? Please describe.**
Problem: Need to calculate the similarity between texts stored in 2 columns of the same or different dataframes
For example, the…
-
For test case:
```
test("test dataFrameComparer") {
val df1 = spark.createDataFrame(
spark.sparkContext.emptyRDD[Row],
StructType(
List(
StructField("neste…
-
Sparklyr fails to parse `dplyr` syntax that uses [`across`](https://dplyr.tidyverse.org/reference/across.html) function.
# Example
```r
# Settings
library("sparklyr", quietly = FALSE)
library("…
-
## Describe the proposal
Option to save torchscript model using `torch.jit.save` instead of `torch.save` which enables the deployment toolkits to pickup the optimized torchscript model for production…
-
Problem:
With real-world Spark dataframes (e.g. 50 vector-assembled columns with real values, 130000 rows), I get this "An active CatBoost worker is already present in the current process" error when…
-
Completing these courses will provide the sufficient technical knowledge for the internship:
- [x] [Introduction to PySpark](https://app.datacamp.com/learn/courses/introduction-to-pyspark)
- [x] [Dat…
-
I did a simple select using spark.read.bigquery. it works fine, the moment I do join with other table it breaks with error saying invalid filter. Below is the code snippet
val objkdf = spark.read.big…
-
Sorry if this is not the right channel to ask questions.
-
**UPDATE**: A temporary workaround - https://github.com/absaoss/spline-spark-agent/issues/272#issuecomment-895947366
The issue was found in and causing AbsaOSS/spline#925
See JSON sample in http…
wajda updated
2 years ago