RubixML / Sentiment

An example project using a feed-forward neural network for text sentiment classification trained with 25,000 movie reviews from the IMDB website.
https://rubixml.com
MIT License
107 stars 13 forks source link

Error trying to train, WordCountVectorizer missing parameter $maxDocumentFrequency #3

Closed bavamont closed 4 years ago

bavamont commented 4 years ago

I am getting this error, when I am trying to train using your train.php (https://github.com/RubixML/Sentiment/blob/master/train.php) example: Fatal error: Uncaught TypeError: Argument 3 passed to Rubix\ML\Transformers\WordCountVectorizer::__construct() must be of the type int, object given....

In your example on Line 44 you have: new WordCountVectorizer(10000, 3, new NGram(1, 2)),

But the constuctor for WordCountVectorizer expects this: public function __construct( int $maxVocabulary = PHP_INT_MAX, int $minDocumentFrequency = 1, int $maxDocumentFrequency = PHP_INT_MAX, ?Tokenizer $tokenizer = null ) What would be your recommended parameters for WordCountVectorizer for your example to work best?

andrewdalpino commented 4 years ago

Good catch! Did you upgrade versions recently? We added the $maxDocumentFrequency parameter in 0.1.0-rc5 ... thanks for the reminder I am going to update the train script!

Let's try a setting of 5000 for maxDocumentFrequency ... let me know if you get better results with a different setting

Also if you'd like to join our channel on Telegram https://t.me/RubixML

andrewdalpino commented 4 years ago

Should be fixed in the latest update https://github.com/RubixML/Sentiment/commit/d076d864eece25ba91c792cb6ee49917f5147216

Thanks again @bavamont!

bavamont commented 4 years ago

Thank you @andrewdalpino ! I’ll try it with 5000 for maxDocumentFrequency. Thanks again!