SVM - Githubissues

NobleKennamer commented 8 years ago

@abhisaarsharma

abhisaarsharma commented 8 years ago

Did an initial commit with abstract class files, svm and utility classes. As expected, the svm at this time is giving zero true positive rate in the scikit implementation for rbf kernel. True Positives 0 False Positives 483 True Negatives 92658 False Negatives 0

NobleKennamer commented 8 years ago

Checkout what I just added in the RR_Lyrae ipython notebook. I'm able to get a bit better performance using a linear kernal.

I'll also add a dataset for classification that is a bit easier tomorrow. So that way we can get models with higher classification

abhisaarsharma commented 8 years ago

I tried the linear kernel which gives the same results as we obtained before.

The issue is not with the kernel, but the weights as we discussed earlier yesterday, we need a weight distribution such that we can minimize the false positive misclassification. Running with linear kernel with 'auto' (deprecated) weights gives True Positives 478 False Positives 5 True Negatives 89321 False Negatives 3337

Running the rbf kernel with 'balanced' (new heuristic) weights gives True Positives 479 False Positives 4 True Negatives 89396 False Negatives 3262

Which is better than the linear case. I will try to look more into these weight distributions.

NobleKennamer commented 8 years ago

Yes, you are right. I was using auto for weights.

That might actually be a really interesting avenue to explore when writing the report.

abhisaarsharma commented 8 years ago

Hey guys - so i am running into some problems in implementing the svm - the problem is we need to perform an optimization in the dual space for finding the lagrangian multipliers, this involves solving a quadratic convex optimization problem. For this i am using a convex optimizer package cvxopt. In all implementations i have found, we need to feed it an array or size N^2 (since we are in dual space) - but for our dataset this gives an out of memory error (N=number of data points). I will try to look now into scikits own implementation of this - might take me sometime

abhisaarsharma commented 8 years ago

Hey guys - i am having some trouble integrating my python changes in a copied notebook - some of the things seem not to be running - eg. in the rr star - we are using np.load where we have never written import numpy as np, noble did you miss a line while committing? I wrote an import for that

Also the plots after i run a command are not showing up - just a reference to an object. I am not sure if there is a problem with the inline we have specified

NobleKennamer commented 8 years ago

That is what the %pylab interactive is doing.

abhisaarsharma commented 8 years ago

Hmm, it's weird - I get np undefined error still if I remove the import. Can you tell me which version of ipython / Jupyter you are using? On Dec 3, 2015 11:28 AM, "Noble Kennamer" notifications@github.com wrote:

That is what the %pylab interactive is doing.

— Reply to this email directly or view it on GitHub https://github.com/NobleKennamer/astro_porject/issues/5#issuecomment-161754900 .

NobleKennamer commented 8 years ago

I'm using Ipython 3.0.0. When you execute the cell %pylab inline does it say populating the interactive namespace?

abhisaarsharma commented 8 years ago

Turns out it was a problem with my latest jupter - after uninstalling it, things seem to be working fine. I just made a copy of the RR_lyrae file and added an svm into it. For the coded version of svm, i found something called SMO Sequential minimal optimization that avoids constructing the NxN matrix in the dual space - i should be done with it by today probably. Please tell me when are you guys staring to write

abhisaarsharma commented 8 years ago

I have completed a version of SMO for SVM - it is running but it is slow and takes several hours to converge to the solution. Working on optimizations to make it faster.

abhisaarsharma commented 8 years ago

Optimized the code - however it is still taking a long time to finish execution - Running the code in pypy or cpython is expected to decrease the running time by a large factor - result on a small dataset showed reduction from 10 minutes to approximately a minute. I think i will commit for now and close the issue since we do not have enough time.

NobleKennamer / astro_porject

SVM #5