dafriedman97 / mlbook

Repository for the free online book Machine Learning from Scratch (link below!)
https://dafriedman97.github.io/mlbook/content/introduction.html
MIT License
1.07k stars 205 forks source link

neural network hidden layers #11

Open MatthiVH opened 3 years ago

MatthiVH commented 3 years ago

Hi,

I read your eBook on Machine Learning which very well explains everything about linear regression, neural networks etc. It's really helpful. I however had a question on the neural network implementation in Python (https://dafriedman97.github.io/mlbook/content/c7/construction.html).

For simplicity, the model has only 1 layer between input and output as stated in the beginning of the text. What about the following line in the code then? -> ffnn.fit(X_boston_train, y_boston_train, n_hidden = 8) n_hidden is set at 8. Can you exaggerate on this? What does n_hidden mean exactly and why is it set at 8 in this example?

Kind regards, Matthias

dafriedman97 commented 3 years ago

Hi Matthias,

I'm glad you're finding the book helpful!

I think the confusion is between the number of hidden layers and the number of nodes per hidden layer. I assumed the model only had one hidden layer, however this hidden layer has 8 nodes. I think the name "n_hidden" might have been a bit confusing.

Consider the diagram below. It has two hidden layers, but each hidden layer has 3 nodes. Does that clarify things? Please let me know if not.

Screen Shot 2020-10-05 at 7 44 43 PM
MatthiVH commented 3 years ago

Hi,

Oh I see, so it's the number of nodes in the one hidden layer.

1) Ok, but then again, why is the number of nodes for that hidden layer set at 8? Is it trial and error which number of nodes get the best result?

2) Can the number of nodes be higher than the number of input parameters? Or is always nNodes ≤ nParameters

Kind regards, Matthias

dafriedman97 commented 3 years ago

Exactly, number of nodes in the one hidden layer.

  1. I pretty arbitrarily chose 8—maybe I should make that clearer. There is no one-size-fits-all rule for determining the optimal number of nodes, and it is more or less trial and error. In practice, I would suggest using cross validation to determine the number of nodes. There are some general rules-of-thumb, though. A common (but broad) one is that the number of nodes in the hidden layer should be between the number of nodes in the input layer (number of predictors) and the number in the output layer (number of targets). I would say it's more or less trial and error within that range.
  2. There is no reason why the number of nodes can't be higher than the dimension of the input. However, increasing the number of nodes increases the potential for overfitting, so be careful!