boxuancui / DataExplorer

Automate Data Exploration and Treatment
http://boxuancui.github.io/DataExplorer/
Other
513 stars 88 forks source link

Fix misleadingness in vignette #55

Closed boxuancui closed 6 years ago

boxuancui commented 6 years ago

As pointed out by @peSHIr here, the data needs to be cleaned before grouping.

The vignette aircraft examples are a bit misleading, as data needs a bit more cleanup, I think. Airbus is in the list with two different strings, McDonnell Douglas with at least three, and Canada with two. If those were first lumped together into one each, before lumping the long tail together into an "other" bin, this could make a big difference in further modeling, as Airbus would jump to largest group by far, not the third, with about half of the Airbus data being lumped into "other". #oops

peSHIr commented 6 years ago

Never thought at the time to add this as an issue on github myself. Thanks for taking the time to find me on here and link to me. Kudos.