-
Increasing spark.executor.memory or spark.executor.cores _worsens_ performance of HUDI Exporter
**To Reproduce**
Steps to reproduce the behavior:
1. Run the HUDI exporter varying spark.execut…
-
We request the community to **Benchmark Record Level Indexing (RLI) with Simple Indexing**. The blog at https://hudi.apache.org/blog/2023/11/01/record-level-index/ provides a great comparison between …
-
### Description
[Sort] Support the metric of Apache hudi
### Use case
_No response_
### Are you willing to submit PR?
- [X] Yes, I am willing to submit a PR!
### Code of Conduct
- [X] I agree t…
-
### Description
Problem
1. EMR Serverless (EMR on EKS too?) has major limitations when it comes to HDFS which prevents loading pretrained models.
2. Caching in s3 requires passing AWS Access Ke…
-
### Problem Detail:
I am trying hudi record index on my machine, although my pyspark job runs smoothly and data is written along with creation of record_index file in the hudi's metadata table, it g…
-
**Describe the problem you faced**
I've been struggling with the failing synchronization with Glue Catalog. I have the process(AWS Glue Job) which reads from Hudi table and then writes to the Huid …
-
Hello Guys,
all my deltacommits are being written < 1hr but so much time is being wasted in deleting marker directory[shown in screenshot], but never got proper understanding why exactly it is hap…
-
**Describe the problem you faced**
i create a cow table named 'hudi_cow_tbl' and a mor table named 'hudi_mor_tbl' in the flink sql client mode, then both insert into row data. after that, i…
-
Using Spark 3.4.0, Scala 2.12, Hudi 0.14.0 and Iceberg spark runtime 1.4.2
- Created Hudi source table in S3
- Ran the one table sync jar using config.yaml
```
sourceFormat: HUDI
targetFormats:…
-
Opened this on Hudi Side and OneTable as I am not sure which is best place to open this ip
HUDI GH https://github.com/apache/hudi/issues/10784
# Description
As a user of Hudi Delta Streamer, I…