Closed acadTags closed 6 years ago
hi. I think these too line are different.
for the line below it is a fully connected layer to do additional transform, which is to make model's capability even big, and you can also find this usage in many classic cnn models.
self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True)
however, the below line is just a linear transform to match label size. as you can see there is no activation function:
logits = tf.matmul(self.h_drop,self.W_projection) + self.b_projection
Thank you. I will see how the model perform with or without this dense layer.
Actually, the logit is later used with an activation function (softmax, sigmoid or relu) for prediction. In your code, the activation function is in the functions tf.nn.sparse_softmax_cross_entropy_with_logits or tf.nn.sigmoid_cross_entropy_with_logits.
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits);
losses = tf.nn.sigmoid_cross_entropy_with_logits(labels=self.input_y_multilabel, logits=self.logits)
Thanks a lot for your code!
May I ask the line
self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True)
which is before the logit?It seems this is already a fully connected layer with bias term, should we still calculate logit and apply it on softmax?
Is there any theoretical reason for this additional dense layer? Many thanks!
` #4.=====>add dropout: use tf.nn.dropout with tf.name_scope("dropout"): self.h_drop=tf.nn.dropout(self.h_pool_flat,keep_prob=self.dropout_keep_prob) #[None,num_filters_total] self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True)
5. logits(use linear layer)and predictions(argmax)