cubed-dev / cubed

Bounded-memory serverless distributed N-dimensional array processing
https://cubed-dev.github.io/cubed/
Apache License 2.0
122 stars 14 forks source link

Executor for Apache Spark #499

Open rbavery opened 4 months ago

rbavery commented 4 months ago

Could Spark be added as a supported executor?

Maybe RDD.map or RDD.mapPartitions would be the correct way to map a function similar to map_unordered in the Lithops executor.

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.mapPartitions.html#pyspark.RDD.mapPartitions

To support this a guess would need to be made up front on the reserved memory available for python UDFs. It sounds like currently this would be done globally but maybe later could be done on a per-operator basis?

tomwhite commented 4 months ago

A Spark executor would be a great addition. I just added some notes about implementing a new executor in #498 if you're interested in having a go at this @rbavery?

rbavery commented 4 months ago

I'm definitely interested, thanks for adding notes. It's possible I won't make quick (or any) progress because of other responsibilities 😬