spark-dataframes Search Results

1000+ results
for spark-dataframes

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

MicrosoftDocs/azure-docs #88451

Not a solution to the "cancel a single cell" problem

The solution provided by the team for the issue of not being able to execute other cells after cancelling out another cell (and having this one "frozen" for a long time) is not a solution good enough,…

GermanCM updated 2 years ago
4
sparklyr/sparklyr #3214

spark_read_delta does not benefit from partitions

I am using Databricks witha Delta Lake in the background. Databricks runtime is 10.2, with sparklyr 1.7.2. ```r sc % filter(yearMonth >= 201801) %>% filter(yearMonth < 202202) ``` is much faste…

Adrian-S-D updated 2 years ago
2
gchq/Gaffer #1916

Optimise SortFullGroup in ParquetStore by using repartitionB…

Spark 2.3 introduced a `repartitionByRange` option on dataframes. This could be used to improve the efficiency of `SortFullGroup` in the Parquet store (possibly avoiding the need to use RDDs, which co…

gaffer01 updated 2 years ago
1
NVIDIA/spark-rapids #927

[QST] Recommended approach for a reference avro writer/reade…

**What is your question?** This is a question to the spark team of rapids. As part of cuIO refactor, We(rapids cudf team) are currently working on adding fuzz testing coverage for our avro reader…

galipremsagar updated 2 years ago
6
section-engineering-education/engineering-education #5003

[Languages]Creating a PySpark DataFrame: A Beginner's Guide

### Topic Suggestion Creating a PySpark DataFrame: A Beginner's Guide #### Proposed article introduction We can distribute data and conduct calculations on several nodes of a cluster using Spark, a…

FranciscaNg updated 2 years ago
2
databricks/spark-xml #579

ClassCastException occurs attempting to access dataframe, e.…

Using Spark version 3.2.1 Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.14.1) I load the xml files below. First one to establish the schema and the second with the actual insta…

chrisfw updated 2 years ago
16
JohnSnowLabs/spark-nlp #1023

Requesting a feature to find the similarity between text col…

**Is your feature request related to a problem? Please describe.** Problem: Need to calculate the similarity between texts stored in 2 columns of the same or different dataframes For example, the…

kaniska updated 2 years ago
6
mrpowers-io/spark-fast-tests #96

bug: ignoreNullable doesn't work for nested StructTypes

For test case: ``` test("test dataFrameComparer") { val df1 = spark.createDataFrame( spark.sparkContext.emptyRDD[Row], StructType( List( StructField("neste…

mlavengood-sayari updated 2 years ago
2
sparklyr/sparklyr #2667

Failure when using dplyr's across

Sparklyr fails to parse `dplyr` syntax that uses [`across`](https://dplyr.tidyverse.org/reference/across.html) function. # Example ```r # Settings library("sparklyr", quietly = FALSE) library("…

konradzdeb updated 2 years ago
11
catboost/catboost #1622

[catboost4j-spark] SparkException error when spark.executor.…

Problem: With real-world Spark dataframes (e.g. 50 vector-assembled columns with real values, 130000 rows), I get this "An active CatBoost worker is already present in the current process" error when…

candalfigomoro updated 2 years ago
3

上一页 1...75 76 77 78 79 80 81...100 下一页

1000+ results for spark-dataframes

1000+ results
for spark-dataframes