Open mchinen opened 4 years ago
It seems you haven't done proper parameter selection
./gridregression.py ~/Downloads/mysvmtrainfile.txt ... [local] -1 -5 -8 0.55566 (best c=16.0, g=1.0, p=0.25, mse=0.294086) 16.0 1.0 0.25 0.294086
libsvm-3.24$ ./svm-train -s 3 -c 16 -g 1 -p 0.25 ~/Downloads/mysvmtrainfile.txt .. optimization finished, #iter = 1778 nu = 0.509791 obj = -979.425784, rho = -2.770594 nSV = 238, nBSV = 161 libsvm-3.24$ ./svm-predict ~/Downloads/mysvmtrainfile.txt mysvmtrainfile.txt.model o Mean squared error = 0.208275 (regression) Squared correlation coefficient = 0.786998 (regression)
A cross validation r^2 about 0.78 isn't too bad
libsvm-3.24$ wc -l o 376 o libsvm-3.24$ grep -e "4." o |wc -l 85 libsvm-3.24$ cut -f 1 -d ' ' ~/Downloads/mysvmtrainfile.txt | grep -e "4." |wc -l 100
On 2019-12-18 12:57, Michael Chinen wrote:
When training with svm-train -s 4 -t 2 -n .6 -c .4
I find that the predictions are very much compressed. For example, myfile has labels in the 1 to 5 region, with a significant in 4 to 5, but the highest predicted value on the train set is below 4.0. It seems that there are fewer predictions in the 1.0 to 2.0 region as well. I've played with NU_SVR and EP_SVR and the other parameters and haven't found a good solution to this. I have Any ideas? Here is my train file. Even when normalizing the labels to 0-1 I get the same behavior, where the highest value is .72.
Unnormalized: mysvmtrainfile.txt [1] Normalized: normsvmtrain.txt [2]
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [3], or unsubscribe [4]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ", "url": "https://github.com/cjlin1/libsvm/issues/158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/cjlin1/libsvm/files/3980481/mysvmtrainfile.txt [2] https://github.com/cjlin1/libsvm/files/3980504/normsvmtrain.txt [3] https://github.com/cjlin1/libsvm/issues/158?email_source=notifications&email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ [4] https://github.com/notifications/unsubscribe-auth/ABI3BHWMSRJXVU6YO3WPLXTQZKFEXANCNFSM4J4RT4NQ
Thanks so much, that does seem to be the issue. I hadn't realized the importance of searching the parameters before reading your PDF, and used our last model's parameters. I modified grid.py to do a search and found better parameters which were wildly different. I found I also needed to tune the nu paramter.
However, I see my problem is confounded by another issue that I also resolved:
When training with
svm-train -s 4 -t 2 -n .6 -c .4 <myfile>
I find that the predictions are very much compressed. For example, myfile has labels in the 1 to 5 region, with a significant in 4 to 5, but the highest predicted value on the train set is below 4.0. It seems that there are fewer predictions in the 1.0 to 2.0 region as well.I've played with NU_SVR and EP_SVR and the other parameters and haven't found a good solution to this. Here is my train file. Even when normalizing the labels to 0-1 I get the same behavior, where the highest predicted value is .72.
First, I'd like to know if I'm doing something incorrectly. Next, if this is a correct model, why is it so compressed? I would like the predictions to be closer to the boundaries of the training labels. I understand that we would expect some compression towards the mean in regression, but this seems more than I would expect. Should I normalize the predicted output to match the input label distribution?
Unnormalized: mysvmtrainfile.txt Normalized: normsvmtrain.txt