-
I am applying around 400 dq checks to a table with 30M rows of data and 250 columns, and around 25% of these checks only apply to a subset of rows. There is too much data to use Pandas Dataframes. I…
-
The possibility to write the query result into a file (like through `--output-path`) should be extended to other formats such as compressed JSON, Parquet, CSV, etc.
This would extend Rumble to a co…
-
Could you add support to convert `mdf4` to `parquet`? as it is really important for people working with Spark.
I know `asammdf` has the ability to convert to `parquet` but sometimes it doesn't wor…
-
Please fill out the form below.
### System Information
- **Spark or PySpark**: Pyspark
- **SDK Version**: sagemaker-pyspark==1.2.2.post0
- **Spark Version**: 2.4.2
- **Algorithm (e.g. KMeans)**…
-
Hey @kvnkho
I'm back using `fugue` again. I was wondering what the canonical `fugue` method is for loading multiple csvs into `fugue`? I can write this up as a recipe after if you like
I have…
-
**Is your feature request related to a problem? Please describe.**
We are trying to consume two different sources 1 remote e.g.: BigQuery and another one local Parquet file, which is currently not po…
-
# draft 1
~~1. Command-Line Tasks - 命令行任务
1.1 Getting Started with the Scala REPL - 开始使用Scala REPL
1.2 Loading Source Code and JAR Files into the REPL - 将源代码和JAR文件加载进REPL
1.3 Getting Started wit…
-
Hi.
I downloaded the actual source code (commit [0e5b8c3](https://github.com/spidru/JGribX/commit/0e5b8c3e2d1b52cb9578bda811ac30b6ad2ab15e) and try to build it with gradle on Intellij. When doing i…
-
Most of the articles and blog always talk about SQL via dataframes approach, so just wondering **Can I run SQL queries via spark-rapids[Spark-SQL shell] ?**
-
Let's add support for [`InMemoryRelation`](https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/columnar/InMemoryRelation.html) (see also InMemoryRelation [internals](https://jaceklaskows…