Closed daholste closed 4 years ago
Related to #2802
@justinormont and @shauheen, do you want this to go in V1.0?
That's up to @shauheen. I'd say yes, as there's strong upsides of accuracy. You'll notice the large jump in accuracy (y-axis) when we move from the blue to green lines in the above graph.
The power of defaults should never be underestimated.
Related: https://github.com/dotnet/machinelearning/issues/2305
Tracking in #4749
@justinormont and the text team tuned default n-gram lengths for the default text recipe in the internal repo
These defaults are: Word -- bigrams (w/ unigrams) Character -- trigrams (w/o unigrams and bigrams)
One chart from his findings:
The line w/ the light blue call-out represents current ML.NET defaults (Unigram + Trichar) The line w/ the light green call-out is the requested change (Bigram + Trichar) The line w/ the pink call-out shows the Trigram+Trichar is better in terms of accuracy, but with a time hit, and accuracy has a cross over at NumIterations > 8 for Averaged Perceptron learner.