Open cristiprg opened 7 years ago
@cristiprg Redis is an in-memory but persistent on disk database. Even this is not the case, why would being in-memory change everything?
@jeyhunkarimov Because of two reasons: 1) multiple processes should have access to the same data. Like if computer1 (or JVM1) creates a model, then computer2 should also have access to that model. This is not easily possible with in-memory (only with tricky shared memory at OS-level or JVM-level) 2) if the process dies, everything that is in its memory is deleted, so the model is lost. I'm not sure, but maybe it can be recovered from the checkpoints that Spark makes, but I don't think is trivial.
Please correct me if I'm overthinking this.
@cristiprg let Redis handle this and you focus on the solution. Redis handles those issues because it is supports concurrent updates. It uses cache and persistent storage on disk and synchronises them. Moreover, to gain the speedup, we are ready to sacrifice some accuracy due to possible issues you listed above.