chrislusf / gleam

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
Apache License 2.0
3.45k stars 290 forks source link

Cached/persistent in memory dataset #161

Open tux-mind opened 5 years ago

tux-mind commented 5 years ago

Hello, I am trying to figure out if gleam is the right tool for the job.

My inquiry is similar to #27 , I have a huge amount of data that I want to keep in memory to run a certain operation on it several times in a distributed fashion.

For the sake of clarity, I am reporting a simplified pseudo-code example of my use case.

data = seq(1, 1024*3)
mapping = (n) -> n*56
filter_op = (n) -> n < 123456

My goal is to have 3 nodes, each with 1024 numbers. Each node would run mapping, discarding results that do not satisfy the filter_op predicate.

I would like to be able to run this multiple times, changing mapping or filter_op but keeping the 1024 numbers in the nodes' memory.

Thank you in advance for your time and help!

chrislusf commented 5 years ago

Currently not ready, but sounds an interesting use case.