BIDS-collaborative / destress

Helping @peparedes with text analysis of livejournal data
ISC License
7 stars 2 forks source link

Baseline classifiers #26

Closed davclark closed 9 years ago

davclark commented 9 years ago

John recommended increasing threshold to include ~10M tokens.

@davclark would love to see a confusion matrix once there are classifiers. This could be another route to creating an input for somethng like ICA.

cc @anasrferreira

coryschillaci commented 9 years ago

The cutoff in the trimming is actually around 41 now that we have stripped the <base64> stuff out, so I don't think there is a reason to push up to 10^7 tokens.