Closed davclark closed 9 years ago
John recommended increasing threshold to include ~10M tokens.
@davclark would love to see a confusion matrix once there are classifiers. This could be another route to creating an input for somethng like ICA.
cc @anasrferreira
The cutoff in the trimming is actually around 41 now that we have stripped the <base64> stuff out, so I don't think there is a reason to push up to 10^7 tokens.
<base64>
John recommended increasing threshold to include ~10M tokens.
@davclark would love to see a confusion matrix once there are classifiers. This could be another route to creating an input for somethng like ICA.
cc @anasrferreira