Closed tomerk closed 9 years ago
@dcrankshaw
Overall this looks good. I like having the long-lived SparkContext
. And using Spark for broadcasts seems to significantly simplify things in the long-run, at the cost of a well-understood performance penalty (an extra copy). However, I'm a little concerned about having a single shared SparkContext
and BroadcastProvider
. It should be the case that a single model is only ever interacting with Spark from a single thread, because bulk retrain is guarded by a global per-model lock. However, there is nothing to stop multiple models from doing bulk retrain at once. Should there be?
LGTM
Relates to (but doesn't fully cover) issues #48 #49 and #46