garydoranjr / misvm

Multiple-Instance Support Vector Machines
BSD 3-Clause "New" or "Revised" License
238 stars 81 forks source link

MissSVM result #14

Closed callamuyu closed 5 years ago

callamuyu commented 6 years ago

According to the released code, however, the accuracy of MissSVM is only 40%, I have read the original paper, but have not found any difference between code and paper. So, what do you think the reason why the accuracy is only 40%? Thank you!!

garydoranjr commented 6 years ago

Hello @callamuyu, are you comparing your own results to the original paper? On which datasets? It could be the folds you used, the parameter values select, or some other factor that's causing a difference.

ghost commented 5 years ago

I'm seeing the same thing, the example.py reports an accuracy of 40% on MissSVM. Even more oddly, adjusting the number of iterations has little effect, but setting the iterations to 0 actually makes it 80% accurate.

garydoranjr commented 5 years ago

@doktorschrott I didn't realize that @callamuyu was referring to the values output by the example code. I tracked this down, and the reason seems to be that this algorithm is particularly sensitive to the input features being normalized (approximately mean zero and standard deviation of 1). Having approximately mean-centered data seems to be the more important requirement.

Using 0 iterations was probably using the initialized solution, which happened to be better than what the algorithm was doing with the unnormalized data (predicting the negative class for all bags).

Thank you for pointing out this behavior in the example. Of course, this example does not do proper cross-validation (only a single train-test split with fixed parameters), so it is not intended to be a rigorous experiment, only a demonstration of how to call the code. On the other hand, it is confusing to have these algorithm do so poorly in the example, so I've updated it to include this additional normalization step. Now the performance on this train/test split is closer to the result reported in the original paper. It is still not optimal because only a linear kernel is being used an parameters are not being optimized, but it is much better than it was before.