-
Hello,
Good job on Spark.jl.
I have a issue, I tried to learn Spark and I followed the documentation:
> This is a quick introduction into the Spark.jl core functions. It closely follows the …
-
Using the Dataframes API instead of using RDDs directly may provide a speed improvement through the use of the Catalyst optimizer.
-
When running a comparison on dataframes with a single column, the following exception is thrown:
```
/opt/venv/lib/python3.8/site-packages/datacompy/spark.py:356: in rows_both_mismatch
self._…
-
**Describe the bug**
While using spark dataframe with latest great-expectations , throwing error
Bad input to build_batch_request: Can not build batch request for dataframe asset without a datafram…
-
**Is your feature request related to a problem? Please describe.**
It would be interesting to analyze dependencies on column level, in order to
- understand what transformations have been applied "d…
-
Hi,
I'm considering to write an extension making it possible to use spark dataframes with this tool. as it is pretty similar to Pandas dataframes, but does not necessarily have the same problems re…
-
As far as I understand, Breeze was designed in order to take advantage of BLAS/LAPACK and even specialized implementations for GPUs. So, chances are that my question below is simply out of context, or…
-
### Apache Iceberg version
1.5.2 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Hello,
We have an existing working Spark Scala job (Spark 3.2.0 Iceberg 1.4.0 Sca…
-
**Is your feature request related to a problem? Please describe.**
When the data size is quite large, many times we might need to use larger than RAM data. Also, using an engine like Polars will spee…
-
### Motivation: Why do you think this is important?
Flyte does not support nested `StructuredDatasets` for example:
```
schema = StructType([
StructField('name', StructType([
…