amplab / velox-modelserver

http://amplab.github.io/velox-modelserver/
Apache License 2.0
110 stars 26 forks source link

Added a broadcast via spark & switched to a single long-lived spark context #51

Closed tomerk closed 9 years ago

tomerk commented 9 years ago

Relates to (but doesn't fully cover) issues #48 #49 and #46

tomerk commented 9 years ago

@dcrankshaw

dcrankshaw commented 9 years ago

Overall this looks good. I like having the long-lived SparkContext. And using Spark for broadcasts seems to significantly simplify things in the long-run, at the cost of a well-understood performance penalty (an extra copy). However, I'm a little concerned about having a single shared SparkContext and BroadcastProvider. It should be the case that a single model is only ever interacting with Spark from a single thread, because bulk retrain is guarded by a global per-model lock. However, there is nothing to stop multiple models from doing bulk retrain at once. Should there be?

dcrankshaw commented 9 years ago

LGTM