Closed giulio-datamind closed 11 months ago
The main issue is that you have too few data. If -b 1 is used, which is for probabilistic outputs, internally we conduct a cross validation process. Thus there is some randomness. To have deterministic results, either fix the seed or, if prob outputs not needed, remove -b 1
@cjlin1 thank you very much for your reply.
In my case I need probabilistic results, so option -b 1
is mandatory.
Yes, I could change the seed to make this particular example work, but in general I cannot consider this a solution, since for other input data I could fall into the same low-accuracy problem.
Why do you think that constraining the sigmoid function to be always increasing (considered the fact that we expect high probabilities for samples labeled +1 and low probabilities for those labeled -1) could not be a solution? Are there any drawbacks in constraining someway probA < 0
inside sigmoid_train
function?
I think we do have that sigmoid is always increasing. So I don't understand your question
I'm sorry: I probably made some confusion with the sign of probA
(I edited the above messages to fix them). I try to explain me better with other words.
Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42)
on my machine) the trained model has accuracy of 0%.
I noticed that 0% accuracy models have a positive value for probA
, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB))
, where x
is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x
that tends to +infinity, SF(x)
tends to 1 if probA < 0
and to 0 if probA > 0
.
Considering the fact that we expect high probabilities (i.e., high values of SF(x)
) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA
to be always lower than 0.
I attach the adaptation of svm-train.c that I used to make the experiments, hoping it can help.
What if you put 10 copies of the same data together as input? I suspect the situation may be improved.
On 2023-11-02 19:56, giulio-datamind wrote:
I'm sorry: I probably made some confusion with the sign of probA (I edited the above messages to fix them). I try to explain me better with other words.
Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42) on my machine) the trained model has accuracy of 0%.
I noticed that 0% accuracy models have a positive value for probA, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB)), where x is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x that tends to +infinity, SF(x) tends to 1 if probA < 0 and to 0 if probA > 0.
Considering the fact that we expect high probabilities (i.e., high values of SF(x)) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA to be always lower than 0.
I attach the adaptation of svm-train.c that I used to make the experiments [1], hoping it can help.
-- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you were mentioned.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374", "url": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/cjlin1/libsvm/files/13238410/svm-train-adapted.zip [2] https://github.com/cjlin1/libsvm/issues/207#issuecomment-1790593374 [3] https://github.com/notifications/unsubscribe-auth/ABI3BHWESD3VUCLBRNPWI63YCOC5NAVCNFSM6AAAAAA6XYI3SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQGU4TGMZXGQ
Yes, you are correct.
By simply replicating the input data 10 times, the input file becomes like this; with this input all the 1000 experimented seeds lead to a 100% accuracy model.
I think, however, that there is no reason for not trying to directly improve the algorithm in order to make it work better also for lower-cardinality datasets, as is often the case.
I tried to impose the constraint probA < 0
by adding the line
newA = newA > -eps ? -2 * eps - newA : newA;
immediately after the
newA = A + stepsize * dA;
in the backtracking loop of sigmoid_train
function. Furthermore, I set the initial value to A = 1
instead of A = 0
.
In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.
With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.
Are there any disadvantages I didn't foresee in these modifications?
It's ok to impose such a constraint, but then this is a constrained optimization problem. Either a constrained optimization algorithm is used or you need to prove the convergence of your setting.
On 2024-01-04 17:54, giulio-datamind wrote:
I tried to impose the constraint probA < 0 by adding the line
newA = newA > -eps ? -2 * eps - newA : newA;
immediately after the
newA = A + stepsize * dA;
in the backtracking loop of sigmoid_train function. Furthermore, I set the initial value to A = 1 instead of A = 0.
In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.
With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.
Are there any disadvantages I didn't foresee in these modifications?
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you were mentioned.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586", "url": "https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/cjlin1/libsvm/issues/207#issuecomment-1876809586 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRM3JKLE64UZUUOSG3YMZ35XAVCNFSM6AAAAAA6XYI3SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWHAYDSNJYGY
In the context of binary SVM classification problems, while I was doing experiments on this input data (constituted by 18 samples, divided in two classes of equal cardinality) I felt into a model with 0% accuracy. The parameters I used for training are: -g 16 -c 32 -b 1.
While attempting to go deep in understanding the reasons of this 0% accuracy, I concluded that it is a consequence of the choice of the random seed.
Hence, I did multiple tests with different seeds. By using as seed the first 1000 integers, I obtained some results that can be summarized in this way:
I noticed that the following sentence is true for all 1000 seeds:
In fact, the condition
probA < 0
corresponds to having an increasing sigmoid function for modelling the decision values' response.Considered this long premise, my questions are:
Thank you very much to anyone who will contribute.
PS: looking for similar questions I didn't find the answer to my question, but I suspect that the questions here are related to the issues
152, #153 and #155.