fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"
MIT License
18.71k stars 8.66k forks source link

IMDB example, why we have 1 neuron in the last layer #189

Open Kuaranir opened 2 years ago

Kuaranir commented 2 years ago

In case of IMDB example, why did we initialize the last layer with only 1 neuron? Though we have two classes: positive and negative reviews:

model.add(layers.Dense(1, activation='sigmoid'))

pkienle commented 2 years ago

How are the true labels coded in that example you are referring to? I assume your true labels are coded as 0 or 1, not as [1, 0] (class 1) or [0, 1] (class 2), right?

Then essentially you have one class only because a sigmoid output < 0.5 could be deemed as negative prediction, and above 0.5 as positive.

Kuaranir commented 2 years ago

I thought we should to set as many neurons in the output layer as classes. At least I heard it on the DL courses...

pkienle commented 2 years ago

That's correct in general, but for a binary classification problem, sigmoid activation and one output layer neuron is sufficient.

PelFritz commented 2 years ago

Hi @Kuaranir , when doing binary classification problems, if you use a sigmoid activation then you use one neurone in the output layer. The reason for this is because with sigmoid activation your network predicts one probability which is the probability of success (probability of class 1). This means the probability of failure can be gotten simply as (1 - prob of success). So a single unit is enough.

If you wanted to predict a probability for each class the you can change the activation function to softmax and use 2 units (neurons). Hope that helps :)

Kuaranir commented 2 years ago

@pkienle thanks)

Kuaranir commented 2 years ago

Thanks!)👍

On Mon, Dec 27, 2021 at 6:22 PM Peleke Fritz @.***> wrote:

Hi @Kuaranir https://github.com/Kuaranir , when doing binary classification problems, if you use a sigmoid activation the you use one neutron in the output layer. The reason for this is because with sigmoid activation your network predicts one probability which is the probability of success (probability of class 1). This means the probability of failure can be gotten simply as (1 - prob of success). So a single unit is enough.

If you wanted to predict a probability for each class the you can change the activation function to softmax and use 2 units (neurons). Hope that helps :)

— Reply to this email directly, view it on GitHub https://github.com/fchollet/deep-learning-with-python-notebooks/issues/189#issuecomment-1001606578, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALSY5BGEKBX5RBDKUEQUVSTUTB5SDANCNFSM5I5HQZMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.*** .com>