Open vsuthichai opened 7 years ago
There's a slowdown with VW cache distribution during at the beginning of the Spark job. Refactor this logic to zip, and distribute the vw dataset to the executors before VW cache generation begins
Local mode will do cache generation a single time only, unlike when executing over the cluster which requires cache generation on every node.
There's a slowdown with VW cache distribution during at the beginning of the Spark job. Refactor this logic to zip, and distribute the vw dataset to the executors before VW cache generation begins