WinVector / pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.
https://winvector.github.io/pyvtreat/
Other
120 stars 8 forks source link

Column name update/seeding tutorials #11

Closed MarkMoretto closed 4 years ago

MarkMoretto commented 4 years ago

Hi. I didn't want to fork the repo for this, but under the Python classification example in the exploratory section, the notebook says:

'Find the mean value of yc'

I think 'yc' is a nominal column and finding the mean wouldn't be possible. With that in mind, here's two friendly suggestions:

  1. Add something like numpy.random.seed(42) or another seed value at the top of the examples for reproducibility by those following the tutorial.
  2. Update the mean value sections. I could be wrong and may have misread the document, but I went through another of the tutorials and some of the stuff copied over could have been mislabeled.

Other than that, the package looks interesting so far.

Thanks!

JohnMount commented 4 years ago

Definitely will update the documentation.

Adding numpy.random.seed() is probably a good idea.

The mean-value of y is well defined for logicals, and is the prevalence of the true class. We will expand on that a bit and rewrite that as numpy.mean(d['yc'] == outcome_target).