-
I'm experimenting with geospark and find the spatial joins slower than expected.
I've set geospark.join.gridtype to "kdbtree" in my configuration below.
Is there something else I need to do to e…
-
In such cases, when trying to write a pyarrow table to parquet with provided schema, and the provided schema contains a field with `nullable=false`, but contains an actual null value , the resulting p…
-
Steps to reproduce:
1. Create a parquet file with space in the column name like so:
`pd.DataFrame(data=[1,2,3], columns=["col with space"]).to_parquet("test.parquet")`
2. Upload it to [demo.lynxkit…
-
Hi,
I have set-up duckdb in memory with Dbeaver and executed:
```
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
CALL load_aws_credentials('default');
```
when trying to fetch the d…
-
I have a glue job in AWS Glue that I'm trying to connect to my Spline server on EC2 using the spline agent jar, using the HTTP lineage dispatcher.
The glue logs show the correct producer url, and …
-
I'm experiencing an issue with the Hudi configuration for the parquet compression codec. Despite setting the option "hoodie.parquet.compression.codec": "GZIP" in my Hudi write options, the output file…
-
### Apache Iceberg version
1.3.1
### Query engine
Spark
### Please describe the bug 🐞
**Setup**
We use the following spark libraries to write to Iceberg on EMR:
`org.apache.iceberg:ic…
-
Do presto support vectorized reader on parquet file?
-
**Describe the bug**
Python integration tests failed on latest EMR `6.12.0` cluster [spark-rapids `v23.06.0` jar special for EMR] , FAILED files:
```
csv_test.py
datasourcev2_read_test.py
js…
-
(you don't have to strictly follow this form)
## Bug Report
1 billion hdfs data import errors
### Briefly describe the bug
```
2024.04.24 18:39:15.592303 [ 34916 ] {} PlanSegmentExecutor: [e…