The following is from a posting by Olivier Grisel. This is something we
should consider learning about and working on.
I wondered if you were aware of the recent developments around sparsity
preserving feature-space dimensionality reduction based on hash
functions, a.k.a. the hashing trick:
http://hunch.net/~jl/projects/hash_reps/
All the three mentioned papers are worth reading in the right order,
the latest one is the most suited to cleartk implementation but lacks
the technical details of the first two. The most interesting point in
my opinion is it makes it possible to drop the requirements of
maintaining a huge vocabulary mapping in memory when using bag of
words based feature extraction.
I think feature hashing preprocessing would be a typical reusable
component to be provided by the cleartk project as preprocessing steps
for the ML input.
Original issue reported on code.google.com by pvogren@gmail.com on 5 Aug 2009 at 3:37
Original issue reported on code.google.com by
pvogren@gmail.com
on 5 Aug 2009 at 3:37