spark-dataframes Search Results

1000+ results
for spark-dataframes

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenMined/PipelineDP #256

DataFrame style API

# Context Now **PipelineDP** supports 3 execution modes - with Apache Spark, Apache Beam, w/o frameworks ([here](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_all_…

dvadym updated 1 year ago
2
quinngroup/dr1dl-pyspark #59

Explore DataFrames for possible serialization speed-ups

We need to examine Spark's DataFrame API as a possible alternative for representing our data (beyond RDDs). DataFrames are structured abstractions; as such, Spark understands the schema prior to execu…

magsol updated 8 years ago
2
apache/iceberg #9874

append() fails with pyspark DataframeWriterV2's writeTo api

### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I get an error when I try to append data using the writeTo api in pyspark with dat…

mkavinashkumar updated 1 month ago
3
MrPowers/chispa #8

Use eqNullSafe instead of collect

Since Spark 2.3 there is the Pyspark function [eqNullSafe](https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.Column.eqNullSafe), this seems a much better way to compare colum…

rragundez updated 5 months ago
2
linkml/linkml #1470

Distributed validation of huge dataset (including cross-refe…

**Is your feature request related to a problem? Please describe.** I have datasets ranging from 50 GB to up to 500 GB, with likely growth to 2+ TB within 5 years. Within these datasets many objects…

vcschapp updated 1 week ago
1
sparklingpandas/sparklingpandas #144

get_dummies() with sparklingpandas

I am looking for a solution to do on Spark something like pandas get_dummies() - is it planned to add something like this anytime soon? if not: could you point me into the right direction how to impl…

pixelsebi updated 7 years ago
2
spyder-ide/spyder #2867

Pyspark dataframe support

The variable explorer and dataframe editor is quite handy for Pandas dataframes - just being able to see how the structure looks helps guide the coding immensely. With Apache Spark becoming increasi…

JackyP updated 8 years ago
4
NVIDIA/spark-rapids #9763

[FEA] Cache input data to reduce integration test time

As of now, each running of integration tests can take more than 3 hours (more than 4 hours on databricks). We can consider caching the input data, storing all the randomly generated data in some stati…

ttnghia updated 11 months ago
4
ydataai/ydata-profiling #1552

Feat: Use ibis as single backend

### Missing functionality I use [ibis](https://github.com/ibis-project/ibis). I would love to be able to profile Ibis Tables, as [I brought up in their issue tracker](https://github.com/ibis-project/…

NickCrews updated 7 months ago
3
Kotlin/dataframe #658

Comparing two data frame

I have two data frames having same same schema, Is there way to compare the two data frames ? so that it provide the added , deleted and modified rows. It may take some single/group of Key columns and…

brindasanth updated 2 months ago
3

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for spark-dataframes

1000+ results
for spark-dataframes