Open SiluPanda opened 5 years ago
Hmm, this looks very interesting, but I am not sure if the AdaBoost implementation is needed at the moment. For that, only @norvig can respond, but he is very busy at the moment. If you are doing this for GSoC, you can add in your proposal that you will do some work on neural networks. I think that would be a really interesting idea, but as a PR, it would be too large for me to merge.
Yes, Thank you, I'll add that on my proposal.
Another small query, what is exaclty size
here? I am looking to patch the infinite loop.
Thanks a lot for the response!
To be honest, I don't know. The cross validation pseudocode is not up to data and we don't know what to do.
This is what size
means from the book:
In this section we explain how to select among models that are parameterized by size. For example, with polynomials we have size = 1 for linear functions, size = 2 for quadratics, and so on. For decision trees, the size could be the number of nodes in the tree. In all cases we want to find the value of the size parameter that best balances underfitting and overfitting to give the best test set accuracy.
Clearly, size
should not go till infinity(as it is in the pseudocode) and the upper limit to it is model specific. I guess the best option is to wait for an update on the pseudocode from @norvig sir.
Are we using sampled data for hypothesis training in AdaBoost? If not, should not we be doing that? Sampled data means, choosing a set of data points from the training set according to their weights. It increases the chance of a misclassified point to get chosen with a greater probability by the next hypothesis. Here is the implementation of AdaBoost done by me. I have implemented sampling of data here.
Let me know if needs to be done.