-
[Current documentation on using RayDP](https://github.com/oap-project/raydp/blob/master/doc/spark_on_ray.md) with Ray Client only syas: "RayDP works the same way when using ray client. However, spark …
-
Hi,
I created a repo where i replaced hostpaths with persistentVolumeClaims, i just tested this new configuration with minikube but i think it should work with a real cluster too.
You can find the…
-
Running Pytorch Pyspark Estimator training on multiple nodes on Kubernetes with big models sometimes got RuntimeError: Socket Timeout- when workers init_process_group. The trace back is as below:
Exc…
-
# tfpark estimator_inception
this example has two part needed to improve.
## command
```java
/opt/spark/bin/spark-submit
--master k8s://https://127.0.0.1:8443
--deploy-mode cluster
--c…
-
@hiboyang Have you tried using remote shuffle service with spark operator? (spark on K8s operator)?
I tested it with the client jar in my 'SparkApplication' image and it works as expected.
Alth…
-
Done:
- Creating a Spark scalable distributed cluster.
- Creating a simple demo app allowing to measure performance.
TODO:
- Add 1, 2, 3 workers nodes and compare performance and measure scala…
-
### What is the bug?
Using Spark 3.5 Streaming Job reading data from Kafka but while writing to OpenSearch giving following error. Checked _cluster/heath response through same endpoints are working f…
-
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
### Search before asking
- [X] I have searched in the [issue…
-
Batch jobs need to be scheduled on yarn/kubernetes, which introduces an overhead that is hard to avoid.
If certain conditions are met, we can also schedule batch jobs in the same way as synchronous j…
-
I built off the 2.2 branch. Works great until it tries to write to parquet files.
Then it can't seem to initialize / find org.xerial.snappy.Snappy. I see snappy in the jars directory of the spark…