bolt-project / bolt

Unified interface for local and distributed ndarrays
http://bolt-project.github.io/
Apache License 2.0
158 stars 20 forks source link

Make Local and Spark Arrays have identical APIs #100

Open kmader opened 8 years ago

kmader commented 8 years ago

For testing and developing functions it would be nice if Local and Spark Arrays had identical functions. This might be easiest by a faking SparkContext and RDD and using BoltArraySpark for everything.

jwittenbach commented 8 years ago

@kmader thanks for the interest in this!

We thought about trying something similar a while back. Recently though, we have been leaning more toward removing the local version of the BoltArray entirely and making the project completely about a distributed ndarray. The thought was that this might be best in terms of modularity -- then a higher level project could wrap the NumPy array, Bolt array, and maybe even the Dask array.

Would be interested to hear your thoughts on this.

kmader commented 8 years ago

@jwittenbach That seems like it might be a good direction to go in. I started on a localspark implementation that simply faked the Spark backend so commands like chunk and unchunk work the same with or without pyspark (https://github.com/bolt-project/bolt/pull/101/files)

Another approach would be to have a generic ParallelBackend for BoltArrays that could be implemented in Spark, Flink, Multiprocessing, etc and allow the operations to all be done the same way.