Closed ericmjonas closed 8 years ago
Also @tomerk suggests that the reshuffle that follows this warning: "15/12/27 21:22:21 WARN BlockWeightedLeastSquaresEstimator: Partitions do not contain elements of the same class. Re-shuffling" can sometimes create empty partitions as well
Hmm the reshuffle creates an RDD with exactly numClasses
partitions afaik, does this happen when you have have no examples in a class ? Anyways we can make the rest of the algorithm work with empty partitions
@shivaram In that case I was running up against https://github.com/amplab/keystone/issues/197 so all my data were being assigned the same class label. That said, empty partitions are still a problem.
Ah yes - That makes sense. If you have it handy, could you paste a stack trace you get when you have empty partitions ?
After discussion with @tomerk it seems that it would be really useful to have the various Block Sovlers not choke on empty partitions. Empty partitions can arise in the course of cross-validation when you want to fiter your data RDD into a "train" rdd and a "test" rdd.