-
**Describe the problem you faced**
Stack: Hudi 0.13.1, EMR 6.13.0, Spark 3.4.1
We are writing to an MOR table in S3, using Spark Structured Streaming job on EMR. Once this job has run for a whil…
-
Is there any documentation for using on Azure synapse? currently i have a job running in Azure Databricks, now i want to move/migrate this job to azure synapse. job has lot of dependent assemblies, wh…
-
Decide on each component of the pipeline and how to implement
- Ingestion
- Queue
- Processing
- Storage
- Dashboard
Decide on infrastructure
- Local machine/one machine setup
- contain…
-
**_Tips before filing an issue_**
hudi 0.14.0 hudi-flink-bundle The COW/MOR table type writes timestamp data, and the time zone for writing data when read.utc-timezone=false is set is still the UTC t…
-
One of the advantages of using Spark.NET over Scala/Java/Python is that you get the ecosystem benefits of coding in C# with VisualStudio as well as how this ecosystem works very well hand in hand with…
-
We are utilizing AWS Managed Apache Flink to handle streaming data and send it to S3 through the Hudi connector. Additionally, I'm running an AWS Glue ETL Job to execute GDPR-related custom data delet…
-
Since PR [Issue #405: DataWriter refactory #402](https://github.com/Qbeast-io/qbeast-spark/pull/402), a ConcurrentAppendException started appearing during a write() and a optimize()
```
io.delta.exce…
-
We see duplicate data in our hudi dataset
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
- Join the mailing list to engage in conversations and get faster support at dev…
-
# Batch Processing
- Runs on a scheduled basis
- May run for a longer period of time and write results to a SQL-like store
- May analyze all historical data at once
- Typically works with mutabl…
-
Thank you for submitting a feature request. **Before proceeding, please review MLflow's [Issue Policy for feature requests](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md#feature-req…