-
**Description**
I have two PySpark dataframes, source_df and target_df. I ran `pip install pyspark-extension` to install diff.
Spark Version - 3.4.1
Scala Version - 2.12
When I run `source_…
-
Dear @cryeo,
I really like your library as it makes possible to integrate SQL syntax directly into cells, that's a nice piece of work!
However I would like to hear from you what's the best way …
-
Users often asks about limitations of KDF to handle large dataframes
The User Guide should contain some recommendations and snippets of code to improve User Path here
- some benchmarks on real-w…
-
### What motivated this proposal?
Is there a way to micro batching the actual NATS streams? Let's say I just want to try and pull 20,000 messages, if there are so many or whatever is in the queue. …
-
I'm really interested in using spark and would love to be able to interact with it using Ruby. This gem looks like a great option. It doesn't look like it would natively support spark dataframes, righ…
-
The goal is to work with really large datasets and extract the results of large queries into a spark dataframe, this will allow us to work with pqs and spark to do large scale feature transformation a…
-
when i write two bigquery dataframes in for loop using function saveAsBigQueryTable(projectid+schemaname+tablename) it gives the error when writing the table in bigquery where it gives error Conflic…
-
That would be neat. I searched around but didn't find what I was looking for. Any help appreciated !
-
from petastorm.spark import SparkDatasetConverter, make_spark_converter
# specify a cache dir first.
# the dir is used to save materialized spark dataframe files
**_spark.conf.set(SparkDatasetCon…
-
Some of the currently implemented caching solutions in spark, namely `CachedWebCrawlerJob` and `PatentMetadataRetrieverJob`, are relying on RDDs while we could take advantage of the full potential of …