DigitalPebble / behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281 stars 60 forks source link

Mahout : fix the vocabulary size #19

Closed jnioche closed 7 years ago

jnioche commented 13 years ago

See http://osdir.com/ml/general/2011-04/msg00949.html

/////////////////////////////////////////////////////////////

Note that you probably need to introduce an "OTHER" token so that you can fix the vocabulary size.

Otherwise, hashed representations will let you have an open vocabulary but still have a fixed feature vector size.

jnioche commented 7 years ago

@smarthi since you are a Mahout expert, is this still an issue and if so how could it be fixed?

smarthi commented 7 years ago

This is highly irrelevant today given the present state of Mahout

Sent from my iPhone

On Nov 24, 2016, at 8:37 AM, Julien Nioche notifications@github.com wrote:

@smarthi since you are a Mahout expert, is this still an issue and if so how could it be fixed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jnioche commented 7 years ago

Ok, closing then. Thanks