JamesHe1990 / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Hashing Trick for feature-space dimensionality reduction #104

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The following is from a posting by Olivier Grisel.  This is something we
should consider learning about and working on.  

I wondered if you were aware of the recent developments around sparsity
preserving feature-space dimensionality reduction based on hash
functions, a.k.a. the hashing trick:

 http://hunch.net/~jl/projects/hash_reps/

All the three mentioned papers are worth reading in the right order,
the latest one is the most suited to cleartk implementation but lacks
the technical details of the first two. The most interesting point in
my opinion is it makes it possible to drop the requirements of
maintaining a huge vocabulary mapping in memory when using bag of
words based feature extraction.

I think feature hashing preprocessing would be a typical reusable
component to be provided by the cleartk project as preprocessing steps
for the ML input.

Original issue reported on code.google.com by pvogren@gmail.com on 5 Aug 2009 at 3:37

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 24 Jul 2012 at 8:17

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 3 May 2013 at 8:44

GoogleCodeExporter commented 9 years ago

Original comment by phi...@ogren.info on 15 Mar 2014 at 5:41