dvgodoy / PyTorchStepByStep

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"
https://pytorchstepbystep.com
MIT License
834 stars 310 forks source link

Bonus Chapter Feature Space.Why you say activation function increase dimensionality? #38

Closed bigmisspanda closed 1 year ago

bigmisspanda commented 1 year ago

activation functions are applied element-wise to individual neurons in a neural network, and their purpose is to introduce non-linearity into the network's computations. The non-linear nature of activation functions allows neural networks to learn and approximate complex functions.so the activation functions themselves do not directly increase dimensionality.I'm get confused about your point about this.can your explian more detail about this to me. thank you. image

dvgodoy commented 1 year ago

Hi @bigmisspanda

First of all, thank you for supporting my work :-) Let me try to address your question. I am assuming you're referring to this passage in the book:

"In Chapter 4, we established that, without activation functions, a deeper model has an equivalent shallow model (a logistic regression, in case of a binary classification). This means we need an activation function to be able to effectively increase dimensionality and, more important, to twist and turn the feature space."

You're right when you say that activation functions themselves do not directly increase the dimensionality, but they are required to make higher-dimensional hidden layers effective. Without activation functions, we could simply multiply all the matrices and get the equivalent single-layer model in the end.

Let's take a sequential model such as this:

nn.Sequential(nn.Linear(2, 10), nn.Linear(10, 2))

Since there are no activation functions, this model is equivalent to:

nn.Sequential(nn.Linear(2, 2))

But, if we introduce activations, the equivalence is broken - this means that the model effectively map the inputs from a two-dimensional space to a 10-dimensional space twisted and turned by the activation, before going back to 2D in the end:

nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, 2))

For more on the equivalence, check also Equations 4.2 and 4.3 in Chapter 4. I hope this helps. Best, Daniel

bigmisspanda commented 1 year ago

Hi @bigmisspanda

First of all, thank you for supporting my work :-) Let me try to address your question. I am assuming you're referring to this passage in the book:

"In Chapter 4, we established that, without activation functions, a deeper model has an equivalent shallow model (a logistic regression, in case of a binary classification). This means we need an activation function to be able to effectively increase dimensionality and, more important, to twist and turn the feature space."

You're right when you say that activation functions themselves do not directly increase the dimensionality, but they are required to make higher-dimensional hidden layers effective. Without activation functions, we could simply multiply all the matrices and get the equivalent single-layer model in the end.

Let's take a sequential model such as this:

nn.Sequential(nn.Linear(2, 10), nn.Linear(10, 2))

Since there are no activation functions, this model is equivalent to:

nn.Sequential(nn.Linear(2, 2))

But, if we introduce activations, the equivalence is broken - this means that the model effectively map the inputs from a two-dimensional space to a 10-dimensional space twisted and turned by the activation, before going back to 2D in the end:

nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, 2))

For more on the equivalence, check also Equations 4.2 and 4.3 in Chapter 4. I hope this helps. Best, Daniel

thank you Daniel.now I see why in the book you say activations was a map between dimensions.great work about your book.very impressive and respect.