Tilakkumar / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

provide an example document classification pipeline using TF-IDF features #74

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This is a common use case that can also serve to highlight feature extraction / 
feature encoding / 
feature encoding factories.

Original issue reported on code.google.com by phwetz...@gmail.com on 18 Mar 2009 at 7:48

GoogleCodeExporter commented 8 years ago
I committed what I have so far in r395. This is not completely tested, and in 
fact has definite problems with the 
full 20 newsgroups data set.

Original comment by phwetz...@gmail.com on 18 Mar 2009 at 8:01

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I added FeatureCollection and FeatureCollectionEncoder and modified Counts, 
then changed IDFMapWriter to 
make proper use of them. This should make it more useful, and I will try to 
build pipeline around it next.

These changes are committed in r826.

Original comment by phwetz...@gmail.com on 21 Jul 2009 at 10:30

GoogleCodeExporter commented 8 years ago
I rewrote the document classification example and it works and it is now clear 
what
should be run and in what order (i.e. Step1BuildIdfMap.java, etc.)  What 
remains is
to add a wiki on what is going on.  But this issue is closed.

Original comment by pvogren@gmail.com on 13 Apr 2010 at 10:56