Closed cchderrick closed 5 years ago
The preprocessing is somewhat broken... You could try StringAnalysis, will get at some point backported here.
using StringAnalysis
doc = "Intel(tm) Core i5-3300k, is a great CPU! ";
s1 = prepare(doc, strip_punctuation);
s2 = prepare(doc, strip_punctuation|strip_numbers);
s3 = prepare(doc, strip_punctuation|strip_whitespace);
@show s1
@show s2
@show s3
# s1 = "Intel tm Core i5 3300k is a great CPU "
# s2 = "Intel tm Core i k is a great CPU "
# s3 = "Intel tm Core i5 3300k is a great CPU "
thanks for the pointer, I will try it out in the meantime
Strange...