-
cc @Fokko .
This is a super simple implementation of an iceberg client for dask. It works for the limited couple of datasets I have available including
- version metadata choice
- snapshot time …
-
ETA: 2024-06-30
We want to use IPv4 addresses of SPARK nodes as the scarce resource that makes it expensive for a single party to run many nodes. ATM, we rely on the trusted spark-api service to re…
-
### What happened?
I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster.
I have not experienced issues prior with Beam using the Flin…
-
## Description
Spark operator pod crashes with ImagePullBackOffError.
- [ X] ✋ I have searched the open/closed issues and my issue is not listed.
## Reproduction Code [Required]
Steps …
-
We have to use a private maven repository proxy to download given packages under deps.
The SparkApplication however stays in status SUBMITION_FAILED because it is unable to verify the certificate of…
-
### Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md)
### Willi…
-
### Description
We are replacing presto in the system with velox, but we have done a lot of self-research and optimization on performance based on presto. We found that there is a performance gap b…
-
## Description
My pyspark application needs to access GCP storage bucket, and i've mounted the secret (containing the service account key) as volume mount .. however getting an error.
```
Trace…
-
Configure monitoring for spark applications is slightly confusing.
With this we setup monitoring:
```yaml
monitoring:
metricsProperties: |
*.sink......
*.source....
expo…
-
**_Tips before filing an issue_**
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
- Join the mailing list to engage in conversations and get faster support at dev-subscri…