-
# Context
Now **PipelineDP** supports 3 execution modes - with Apache Spark, Apache Beam, w/o frameworks ([here](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_all_…
-
We need to examine Spark's DataFrame API as a possible alternative for representing our data (beyond RDDs). DataFrames are structured abstractions; as such, Spark understands the schema prior to execu…
-
### Apache Iceberg version
1.4.3 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
I get an error when I try to append data using the writeTo api in pyspark with dat…
-
Since Spark 2.3 there is the Pyspark function [eqNullSafe](https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.Column.eqNullSafe), this seems a much better way to compare colum…
-
**Is your feature request related to a problem? Please describe.**
I have datasets ranging from 50 GB to up to 500 GB, with likely growth to 2+ TB within 5 years. Within these datasets many objects…
-
I am looking for a solution to do on Spark something like pandas get_dummies() - is it planned to add something like this anytime soon?
if not: could you point me into the right direction how to impl…
-
The variable explorer and dataframe editor is quite handy for Pandas dataframes - just being able to see how the structure looks helps guide the coding immensely.
With Apache Spark becoming increasi…
-
As of now, each running of integration tests can take more than 3 hours (more than 4 hours on databricks). We can consider caching the input data, storing all the randomly generated data in some stati…
-
### Missing functionality
I use [ibis](https://github.com/ibis-project/ibis). I would love to be able to profile Ibis Tables, as [I brought up in their issue tracker](https://github.com/ibis-project/…
-
I have two data frames having same same schema, Is there way to compare the two data frames ? so that it provide the added , deleted and modified rows. It may take some single/group of Key columns and…