-
## Problem
### How much memory does a spark-dependencies job take while handling about 12Gb data index?
I am totally new to the spark project and I have tried serval times to run a spark-depend…
-
## Are fast parallel writes in Delta Tables on S3 possible?
#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [x] Other (Spark Conne…
-
## Bug
### Describe the problem
My simplified use case is to read from one location and append the data to a Delta Lake table with a Hive Metastore in batches. I have to do this for a couple…
-
I have hit this issue twice now in production and finally figured out a reproduction. Basically if you schema evolve a non-nullable struct field to add a new nested field, you will get an NPE when try…
-
I'm trying out Hudi error tables, but I'm having trouble finding the documentation for the hoodie.errortable.write.class value. Could you please assist me?
# sample config
```
hoodie.datasource.…
-
## Bug
### Describe the problem
We have a merge operation using Delta + pySpark that deals with CDC data, mostly Inserts and Updates. On our testing we found out that the behaviour when dealin…
-
My use-case is to process a dataset worth 100s of partitions in concurrency. The data is partitioned, and they are disjointed. I was facing ConcurrentAppendException due to S3 not supporting the “put-…
-
### Describe the bug
Hello, i have an issue on PMM. Tested multiple times with mirror changes, but everytime bot will end with stucked orders without cancelling them by timelimit or so. Error log s…
-
Hope I can get some help on a problem that I’ve been seeing in Deltastreamer, running on Mesos building on the [docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:latest](http://docker…
-
I'm currently up to a project that needs an asynchronous audio support, and I need both the player process and the generator process work at the same time. I've been following your code on [readthedoc…