MorvanZhou / tutorials

机器学习相关教程
https://morvanzhou.github.io/tutorials
MIT License
11.86k stars 5.71k forks source link

Why put train_sizes = 1 ? #1

Closed jimmy-walker closed 8 years ago

jimmy-walker commented 8 years ago

In learning_curve's chapter(https://github.com/MorvanZhou/tutorials/blob/master/sklearnTUT/sk9_cross_validation2.py), train_sizes has been set to [0.1, 0.25, 0.5, 0.75, 1]). I wanna to ask a question, if we set train_size equals 1, then test_size equals 0, right? How could it work to get the test_loss, Cause we have no data to test. Forward to get your reply, thanks.

MorvanZhou commented 8 years ago

I don't think you can set the train_size=[1] or =1, both of these will give you an error. It has to be a list of values that less than or equal to 1

jimmy-walker commented 8 years ago

Sorry, I just mean that train_sizes has been set to [0.1, 0.25, 0.5, 0.75, 1]. This statement will calculate the different outcome under different value set in train_sizes, for example it will calculate the train_loss and test_loss when train_size equal 0.1 etc. If the train_size have been changed to 1, how to calculate the train_loss and test_loss? Cause I just think there would be no data to test. But in learning curve, it really draw the corresponding value under train_size equal 1.

MorvanZhou commented 8 years ago

When I learned it at the first time, I thought it would have the same issue as you think. But it turns out this train_sizes is not the way what we thought. Please refer to this link, it has a detailed explanation about how sklearn splits the training and testing data. http://blog.kaggle.com/2015/06/29/scikit-learn-video-7-optimizing-your-model-with-cross-validation/

Kerrwy commented 8 years ago

Why there is the head statement "from future import print_function", can you tell me the means of it?

MorvanZhou commented 8 years ago

Hi, Kerrwy, that statement is nothing important. You know in python 2 the usage of print function is different from python 3. That statement is for the convenience of different version of python.

Kerrwy commented 8 years ago

Thanks for your response, I have another question, Is in_size and out_size in your code pointed to feature map or the number of neurals?

MorvanZhou commented 8 years ago

For short, the in_size is from last feature maps, the out_size is the filter number for this layer. BTW, I have uploaded the English CNN, RNN tutorials on my Youtube channel. You can also check it out.