-
The draft list of data sources:
1. SQL Databases based on JDBC
2. XML
3. Protobuf
4. Parquet
5. ORC
6. SparkSQL
7. different files on the FileSystem
8. NoSQL databases (MongoDB, Cassandra, I…
-
Would it make sense to be able to introduce support for `avro` schema for `TypedDataSet`?
The current code defines schema based on the `SparkSQL` "language": https://github.com/typelevel/frameless…
-
Spark 3.1
sparksql-scalapb_2.12:0.11.0
When used as a dataset, scalapb generated case classes don't catch type mismatches in the logical plan like ordinary scala case classes do.
E.g. in spark…
-
Hi,
We tested sparksql-protobuf here, works pretty well,
but we need to generate the protobuf classes with [Protoc](https://github.com/os72/protoc-jar-maven-plugin) .
We have created a feature t…
-
### First Bug
```sql
set livy.session.conf.spark.driver.maxResultSize=8g;
set livy.session.conf.spark.driver.memory=12g;
set livy.session.conf.spark.driver.memoryOverhead=8g;
set livy.session.con…
-
### Apache Iceberg version
1.5.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Previously my pipeline was using iceberg 1.3 on Dataproc (image version 2.1 which has spark …
-
Typical analytical data de-normalized and contains map of primitive types. PostgreSQL [hstore](https://www.postgresql.org/docs/current/hstore.html) extension/type solve this task, Clickhouse also supp…
-
**Describe the bug**
when processing SQL scripts where the grammar is backwards, `LineageRunner().target_tables` fails to parse/fix.
**SQL**
Paste the SQL text here. For example:
```sql
FROM `d…
-
Dropping a column used by the most recent PartitionSpec fails cleanly, however dropping a column used by an older PartitionSpec corrupts the table entirely. For example, in SparkSQL:
```
CREATE T…
-
## 背景
目前的体系中,SparkSQL主要提供给ad-hoc类的OLAP查询。SparkSQL通过metastore获取hive表信息,因而可以直接查询Hive表的数据。metastore的性能直接影响SparkSQL的查询速度。
这次的问题是从用户上报的一个case(属于第一类问题)开始追查,过程中出现了很多和“想象”的场景不一样的情况。最终的结果可能很简单,过程中使用到的工具值…