As pointed out by @peSHIr here, the data needs to be cleaned before grouping.
The vignette aircraft examples are a bit misleading, as data needs a bit more cleanup, I think. Airbus is in the list with two different strings, McDonnell Douglas with at least three, and Canada with two. If those were first lumped together into one each, before lumping the long tail together into an "other" bin, this could make a big difference in further modeling, as Airbus would jump to largest group by far, not the third, with about half of the Airbus data being lumped into "other". #oops
As pointed out by @peSHIr here, the data needs to be cleaned before grouping.