cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.55k stars 1.64k forks source link

Accuracies lower than 50% if the random seed is unlucky #207

Closed giulio-datamind closed 11 months ago

giulio-datamind commented 11 months ago

In the context of binary SVM classification problems, while I was doing experiments on this input data (constituted by 18 samples, divided in two classes of equal cardinality) I felt into a model with 0% accuracy. The parameters I used for training are: -g 16 -c 32 -b 1.

While attempting to go deep in understanding the reasons of this 0% accuracy, I concluded that it is a consequence of the choice of the random seed.

Hence, I did multiple tests with different seeds. By using as seed the first 1000 integers, I obtained some results that can be summarized in this way:

I noticed that the following sentence is true for all 1000 seeds:

the accuracy of the trained model is greater than 50% if and only if probA is lower than 0.

In fact, the condition probA < 0 corresponds to having an increasing sigmoid function for modelling the decision values' response.

Considered this long premise, my questions are:

  1. am I someway wrong about this, or is the current version of libSVM supposed to give accuracies below 50% if the combination of input data and random seed is unlucky?
  2. can we do something to avoid to fall in those cases, for example constraining the sigmoid modelling function to always be increasing?

Thank you very much to anyone who will contribute.

PS: looking for similar questions I didn't find the answer to my question, but I suspect that the questions here are related to the issues

152, #153 and #155.

cjlin1 commented 11 months ago

The main issue is that you have too few data. If -b 1 is used, which is for probabilistic outputs, internally we conduct a cross validation process. Thus there is some randomness. To have deterministic results, either fix the seed or, if prob outputs not needed, remove -b 1

giulio-datamind commented 11 months ago

@cjlin1 thank you very much for your reply.

In my case I need probabilistic results, so option -b 1 is mandatory.

Yes, I could change the seed to make this particular example work, but in general I cannot consider this a solution, since for other input data I could fall into the same low-accuracy problem.

Why do you think that constraining the sigmoid function to be always increasing (considered the fact that we expect high probabilities for samples labeled +1 and low probabilities for those labeled -1) could not be a solution? Are there any drawbacks in constraining someway probA < 0 inside sigmoid_train function?

cjlin1 commented 11 months ago

I think we do have that sigmoid is always increasing. So I don't understand your question

giulio-datamind commented 11 months ago

I'm sorry: I probably made some confusion with the sign of probA (I edited the above messages to fix them). I try to explain me better with other words.

Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42) on my machine) the trained model has accuracy of 0%.

I noticed that 0% accuracy models have a positive value for probA, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB)), where x is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x that tends to +infinity, SF(x) tends to 1 if probA < 0 and to 0 if probA > 0.

Considering the fact that we expect high probabilities (i.e., high values of SF(x)) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA to be always lower than 0.

I attach the adaptation of svm-train.c that I used to make the experiments, hoping it can help.

cjlin1 commented 10 months ago

What if you put 10 copies of the same data together as input? I suspect the situation may be improved.

On 2023-11-02 19:56, giulio-datamind wrote:

I'm sorry: I probably made some confusion with the sign of probA (I edited the above messages to fix them). I try to explain me better with other words.

Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42) on my machine) the trained model has accuracy of 0%.

I noticed that 0% accuracy models have a positive value for probA, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB)), where x is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x that tends to +infinity, SF(x) tends to 1 if probA < 0 and to 0 if probA > 0.

Considering the fact that we expect high probabilities (i.e., high values of SF(x)) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA to be always lower than 0.

I attach the adaptation of svm-train.c that I used to make the experiments [1], hoping it can help.

-- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you were mentioned.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374", "url": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/files/13238410/svm-train-adapted.zip [2] https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374 [3] https://github.com/notifications/unsubscribe-auth/ABI3BHWESD3VUCLBRNPWI63YCOC5NAVCNFSM6AAAAAA6XYI3SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQGU4TGMZXGQ

giulio-datamind commented 10 months ago

Yes, you are correct.

By simply replicating the input data 10 times, the input file becomes like this; with this input all the 1000 experimented seeds lead to a 100% accuracy model.

I think, however, that there is no reason for not trying to directly improve the algorithm in order to make it work better also for lower-cardinality datasets, as is often the case.

giulio-datamind commented 9 months ago

I tried to impose the constraint probA < 0 by adding the line

newA = newA > -eps ? -2 * eps - newA : newA;

immediately after the

newA = A + stepsize * dA;

in the backtracking loop of sigmoid_train function. Furthermore, I set the initial value to A = 1 instead of A = 0.

In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.

With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.

Are there any disadvantages I didn't foresee in these modifications?

cjlin1 commented 9 months ago

It's ok to impose such a constraint, but then this is a constrained optimization problem. Either a constrained optimization algorithm is used or you need to prove the convergence of your setting.

On 2024-01-04 17:54, giulio-datamind wrote:

I tried to impose the constraint probA < 0 by adding the line

newA = newA > -eps ? -2 * eps - newA : newA;

immediately after the

newA = A + stepsize * dA;

in the backtracking loop of sigmoid_train function. Furthermore, I set the initial value to A = 1 instead of A = 0.

In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.

With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.

Are there any disadvantages I didn't foresee in these modifications?

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you were mentioned.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586", "url": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRM3JKLE64UZUUOSG3YMZ35XAVCNFSM6AAAAAA6XYI3SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWHAYDSNJYGY