NaturalNode / natural

general natural language facilities for node
MIT License
10.62k stars 860 forks source link

Logistic Regression Classifier throws unable to find minimum exception #220

Closed waglik closed 9 years ago

waglik commented 9 years ago

LRC seems to throw this exception for unknown (to me) reason. Same code was working yesterday but it crash today when calling :

classifier.train();

It seems that removing random combination of

addDocument() 

calls helps. But it does not seem to be an issue of one particularly wrong document.

for example :

classifier.addDocument('London', 'London');
classifier.addDocument('NewYork', 'NewYork');
classifier.addDocument('Toronto', 'Toronto');

fails but removing London it does work. On the other hand adding London but removing Toronto works too.

In total I add about 100 documents to training set.

Any suggestions what went wrong?

kkoch986 commented 9 years ago

are you getting a specific error when it crashses?

waglik commented 9 years ago

just this :

/logistic_regression_classifier.js:81 throw 'unable to find minimum';

brandonburkett commented 9 years ago

I am running into this same issue today, specifically with LogisticRegressionClassifier, the BayesClassifier seems to be working with train().

Oddly, mine seems to be triggering around the 175 addDocument call.

rickystillwell commented 9 years ago

I seem to also be hitting this error with the LogisticRegressionClassifier... It popped up after removing 3 of my 200+ addDocument calls

kkoch986 commented 9 years ago

interesting...all of you are seeing the same errors? The file hasnt had any modifications in 2 years so if something new happened it must be in one of its deps

brandonburkett commented 9 years ago

anything we can do to help? do you need a more detailed example to replicate?

kkoch986 commented 9 years ago

an example would be helpful but it sounds like it involves reading many files @chrisumbel any thoughts on this?

waglik commented 9 years ago

You can use mine. This code works but uncomment any of the 3 lines ("LosAngeles","Montreal" or "Toronto") and you should get the error.

In case you wonder : npm list natural gives following tree

├─┬ kue@0.8.10 │ └─┬ reds@0.2.4 │ └── natural@0.1.17 └── natural@0.1.29

example :

var natural = require('natural'), classifier = new natural.LogisticRegressionClassifier();

//CITIES classifier.addDocument('London', 'London'); classifier.addDocument('NYC', 'NewYork'); classifier.addDocument(['New','York'], 'NewYork'); classifier.addDocument('NewYork', 'NewYork'); classifier.addDocument(['San','Francisco'], 'SanFrancisco'); classifier.addDocument('SanFrancisco', 'SanFrancisco'); classifier.addDocument(['Los','Angeles'], 'LosAngeles'); //classifier.addDocument('LosAngeles', 'LosAngeles'); //classifier.addDocument('Montreal', 'Montreal'); //classifier.addDocument('Toronto', 'Toronto');

//JOBS

classifier.addDocument(['Graphic','Design'], 'Design'); classifier.addDocument(['Linux','System','Administrator'], 'SysAdmin'); classifier.addDocument(['Product','Designer'], 'Design'); classifier.addDocument('Developer', 'Software'); classifier.addDocument(['web','developer'], 'Software'); classifier.addDocument(['web','development'], 'Software'); classifier.addDocument(['web','programmer'], 'Software'); classifier.addDocument('HTML', 'Software'); classifier.addDocument('CSS', 'Software'); classifier.addDocument('JQuery', 'Software'); classifier.addDocument('JS', 'Software'); classifier.addDocument('PHP', 'Software'); classifier.addDocument('Java', 'Software'); classifier.addDocument('JavaScript', 'Software'); classifier.addDocument('Node.js', 'Software'); classifier.addDocument('AngularJS', 'Software'); classifier.addDocument('Ruby', 'Software'); classifier.addDocument('Rails', 'Software'); classifier.addDocument('Python', 'Software'); classifier.addDocument(['Web','Designer'], 'WebDesign'); classifier.addDocument(['Web','Design'], 'WebDesign'); classifier.addDocument('Front-End', 'Software'); classifier.addDocument(['Linux','Engineer'], 'Software'); classifier.addDocument('APIs', 'Software'); classifier.addDocument('Logos', 'Design'); classifier.addDocument('Logo', 'Design'); classifier.addDocument(['Music','Composer'], 'Music'); classifier.addDocument(['Sound','Engineer'], 'Music'); classifier.addDocument('Musician', 'Music'); classifier.addDocument(['Growth','Marketer'], 'Marketing'); classifier.addDocument('Illustrator', 'Art'); classifier.addDocument(['Responsive','Design'], 'WebDesign'); classifier.addDocument('SEO', 'Marketing'); classifier.addDocument(['SEO','Specialist'], 'Marketing'); classifier.addDocument('caricature', 'Art'); classifier.addDocument(['Systems','administrator'], 'SysAdmin'); classifier.addDocument(['Social','Media','Manager'], 'Marketing'); classifier.addDocument('.NET', 'Software'); classifier.addDocument('C#', 'Software'); classifier.addDocument(['3D','artist'], 'Art'); classifier.addDocument(['3D','Design'], 'Design'); classifier.addDocument(['content','marketer'], 'Marketing'); classifier.addDocument(['social','media','channels'], 'Marketing'); classifier.addDocument('Translator', 'Content'); classifier.addDocument('Writer', 'Content'); classifier.addDocument('Journalists', 'Content'); classifier.addDocument('Transcriber', 'Content'); classifier.addDocument('WordPress', 'Software'); classifier.addDocument(['Wordpress','Developer'], 'Software'); classifier.addDocument('linkbuilding', 'Marketing'); classifier.addDocument('Blogger', 'Content'); classifier.addDocument('Writer', 'Content'); classifier.addDocument(['Admin','Exec','Assistant'], 'Office'); classifier.addDocument(['Website','Design'], 'WebDesign'); classifier.addDocument(['Virtual','Assistant'], 'Assistant'); classifier.addDocument(' animation', 'Art'); classifier.addDocument(' Artist', 'Art'); classifier.addDocument(['Android','App','Developer'], 'Software'); classifier.addDocument(['App','Developer'], 'Software'); classifier.addDocument(' Android', 'Software'); classifier.addDocument(['Animator','General','Designer'], 'Art'); classifier.addDocument(' Animator', 'Art'); classifier.addDocument(' writing', 'Content'); classifier.addDocument(' blogs', 'Content'); classifier.addDocument(' newsletter', 'Content'); classifier.addDocument(' illustrations', 'Design'); classifier.addDocument(' Portraits', 'Art'); classifier.addDocument(' editor', 'Content'); classifier.addDocument(' design work.', 'Design'); classifier.addDocument(' UI/UX', 'WebDesign'); classifier.addDocument(' Marketing', 'Marketing'); classifier.addDocument(' Designer', 'Design'); classifier.addDocument(' Logo', 'Logo'); classifier.addDocument(['UI','Design'], 'Design'); classifier.addDocument(' C++', 'Software'); classifier.addDocument(' Programmer', 'Software'); classifier.addDocument(' Cartoon', 'Art'); classifier.addDocument(' photoshop', 'Content'); classifier.addDocument(' photoshop', 'Art'); classifier.addDocument(['Graphic','Designer'], 'Design'); classifier.addDocument(['Cloud','specialist'], 'SysAdmin'); classifier.addDocument(' Sysadmin', 'SysAdmin'); classifier.addDocument(' AWS', 'SysAdmin'); classifier.addDocument(' Linode', 'SysAdmin'); classifier.addDocument(' Tutor', 'Education'); classifier.addDocument(['Computer', 'Vision'], 'Software'); classifier.addDocument(['Content','Writer'], 'Content'); classifier.addDocument('Copywriter', 'Content'); classifier.addDocument('Designer', 'Design'); classifier.addDocument('Illustrator', 'Design'); classifier.addDocument('Brand', 'Marketing'); classifier.addDocument(['Digital','Media','Creation'], 'Content'); classifier.addDocument(['Electronics','engineer'], 'Engineer'); classifier.addDocument(['Engineering','physicist'], 'Engineer'); classifier.addDocument(['Mechanical','Engineer'], 'Engineer'); classifier.addDocument('Excel', 'Office'); classifier.addDocument(['software','developer'], 'Software'); classifier.addDocument('hadoop', 'Software'); classifier.addDocument(['linux','systems','administration'], 'SysAdmin'); classifier.addDocument(['iOS','developer'], 'Software'); classifier.addDocument(['Music','Composer'], 'Music'); classifier.addDocument(' Photojournalist', 'Content'); classifier.addDocument(['Software','Engineer'], 'Software'); classifier.addDocument(['backend','developer'], 'Software'); classifier.addDocument(['full-stack','engineer'], 'Software'); classifier.addDocument('C#', 'Software'); classifier.addDocument('Objective C', 'Software'); classifier.addDocument('SQL', 'Software'); classifier.addDocument('VB.NET', 'Software'); classifier.addDocument('MySQL', 'Software'); classifier.addDocument('iOS development.', 'Software');

classifier.train();

waglik commented 9 years ago

Unfortunately I do not know the details of the algorithm but I noticed descendGradient function have maximum number of iterations hard coded to 500 : var maxIt = 500;

Is there particle reason why 500? Rising maximum number of iterations "solves" the problem but I am not sure about side effects.

brandonburkett commented 9 years ago

@waglik where you able to raise this number w/o having to manually edit or fork this project? I could easily have 500+ documents to add to the classifier and like to compare how bayes vs log regression compare to each other.

chrisumbel commented 9 years ago

The max iterations deals with minimizing the cost function. A max number of iterations is required just to stop natural from continuing to minimize over insanely small numbers forever. It should be safe to increase it -- or make it a parameter.

I'd have to dig back in to see if there's a reasonable way to automatically determine a good number.

waglik commented 9 years ago

@devourment77. No. I temporary changed it in the source code.

@chrisumbel. Then it seems that possibility to set this value somewhere in configuration would be a good idea.

chrisumbel commented 9 years ago

The next release of apparatus will fix this which will hopefully go out in a few days. I'll update and close this issue when that occurs.

dsl101 commented 5 years ago

Sorry to reply to a really old thread, but I'm hitting this error, and I can't see how the updated aparatus module helped. The maxIt value seems to be still hard-coded. I submitted a ticket about it here but got no reply, so I wasn't sure what to do. Then my project went dormant for a while and has just resurfaced so it would be great to get a hint...

Should this be fixed in aparatus (guessing so) rather than natural? And if so how to expose customising that parameter all the way up to natural so I can call something like:

classifier = new natural.LogisticRegressionClassifier({maxIt: 1000})