ClearTK / cleartk

Machine learning components for Apache UIMA
http://cleartk.github.io/cleartk/
Other
130 stars 58 forks source link

Hashing Trick for feature-space dimensionality reduction #102

Open bethard opened 9 years ago

bethard commented 9 years ago

Original issue 104 created by ClearTK on 2009-08-05T15:37:07.000Z:

The following is from a posting by Olivier Grisel. This is something we should consider learning about and working on.

I wondered if you were aware of the recent developments around sparsity preserving feature-space dimensionality reduction based on hash functions, a.k.a. the hashing trick:

http://hunch.net/~jl/projects/hash_reps/

All the three mentioned papers are worth reading in the right order, the latest one is the most suited to cleartk implementation but lacks the technical details of the first two. The most interesting point in my opinion is it makes it possible to drop the requirements of maintaining a huge vocabulary mapping in memory when using bag of words based feature extraction.

I think feature hashing preprocessing would be a typical reusable component to be provided by the cleartk project as preprocessing steps for the ML input.

bethard commented 9 years ago

Comment #1 originally posted by ClearTK on 2012-07-24T20:17:22.000Z:

<empty>

bethard commented 9 years ago

Comment #2 originally posted by ClearTK on 2013-05-03T08:44:33.000Z:

<empty>

bethard commented 9 years ago

Comment #3 originally posted by ClearTK on 2014-03-15T17:41:52.000Z:

<empty>