ClimbsRocks / machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
https://github.com/ClimbsRocks/auto_ml
408 stars 64 forks source link

Questions on hyperparameter distributions and validation percentage #174

Open ClimbsRocks opened 8 years ago

ClimbsRocks commented 8 years ago

Question from @MelvinDunn that I'm documenting here:

Had one question while I was looking at the ol' machina:

-How does this machine determine the starting points for hyperparams? -How does it determine the validation size? (Couldn't find it)

Sorry, I was interested, and while I know I could easily just look at the code myself, I thought you would know off the top of your head.

I'm extremely interested in AutoML, and I think this machine is, well, wonderful.

Thanks again,

Melvin

My response: I love curiosity- thanks for continuing to ask questions!

  1. We use RandomizedSearchCV to find the optimal hyperparameters. It picks parameters randomly from the distributions we give. Those distributions can be found in pySetup/parameterMakers.
  2. Right now the validation size is just hard-coded in. It's a pretty large split. I've messed around with different values, but I want to say it's somewhere around 20-40% depending on the size of the input data. The exception to this is data like Numer.ai that has a specific validationSplit column, that must be specified in the dataDescription row (where we specify what type of data each column holds). Then we just use that validation split.

Keep the questions coming!