Q1: Question on Redis - Githubissues

cristiprg / BDAPRO.GlobalStateML

This repository contains my solution to the project "Machine learning algorithms with global state" from the BDAPRO class at TU Berlin. (The repo is based on BDAPRO.WS1617)

Apache License 2.0

0 stars 0 forks source link

Q1: Question on Redis #5

Open cristiprg opened 7 years ago

cristiprg commented 7 years ago

Redis is an in-memory database instead of on disk. Doesn't this change everything?

jeyhunkarimov commented 7 years ago

@cristiprg Redis is an in-memory but persistent on disk database. Even this is not the case, why would being in-memory change everything?

cristiprg commented 7 years ago

@jeyhunkarimov Because of two reasons: 1) multiple processes should have access to the same data. Like if computer1 (or JVM1) creates a model, then computer2 should also have access to that model. This is not easily possible with in-memory (only with tricky shared memory at OS-level or JVM-level) 2) if the process dies, everything that is in its memory is deleted, so the model is lost. I'm not sure, but maybe it can be recovered from the checkpoints that Spark makes, but I don't think is trivial.

Please correct me if I'm overthinking this.

jeyhunkarimov commented 7 years ago

@cristiprg let Redis handle this and you focus on the solution. Redis handles those issues because it is supports concurrent updates. It uses cache and persistent storage on disk and synchronises them. Moreover, to gain the speedup, we are ready to sacrifice some accuracy due to possible issues you listed above.