Column name update/seeding tutorials

WinVector / pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

Other

120 stars 8 forks source link

Hi. I didn't want to fork the repo for this, but under the Python classification example in the exploratory section, the notebook says:

'Find the mean value of yc'

I think 'yc' is a nominal column and finding the mean wouldn't be possible. With that in mind, here's two friendly suggestions:

Add something like numpy.random.seed(42) or another seed value at the top of the examples for reproducibility by those following the tutorial.
Update the mean value sections. I could be wrong and may have misread the document, but I went through another of the tutorials and some of the stuff copied over could have been mislabeled.

Other than that, the package looks interesting so far.

Thanks!

WinVector / pyvtreat

Column name update/seeding tutorials #11