acadTags commented 6 years ago

Thanks a lot for your code!

May I ask the line self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True) which is before the logit?

It seems this is already a fully connected layer with bias term, should we still calculate logit and apply it on softmax?

Is there any theoretical reason for this additional dense layer? Many thanks!

` #4.=====>add dropout: use tf.nn.dropout with tf.name_scope("dropout"): self.h_drop=tf.nn.dropout(self.h_pool_flat,keep_prob=self.dropout_keep_prob) #[None,num_filters_total] self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True)

5. logits(use linear layer)and predictions(argmax)

    with tf.name_scope("output"):
        logits = tf.matmul(self.h_drop,self.W_projection) + self.b_projection  #shape:[None, self.num_classes]==tf.matmul([None,self.embed_size],[self.embed_size,self.num_classes])
    return logits`

brightmart commented 6 years ago

hi. I think these too line are different.

for the line below it is a fully connected layer to do additional transform, which is to make model's capability even big, and you can also find this usage in many classic cnn models.

self.h_drop=tf.layers.dense(self.h_drop,self.num_filters_total,activation=tf.nn.tanh,use_bias=True)

however, the below line is just a linear transform to match label size. as you can see there is no activation function:

logits = tf.matmul(self.h_drop,self.W_projection) + self.b_projection

acadTags commented 6 years ago

Thank you. I will see how the model perform with or without this dense layer.

Actually, the logit is later used with an activation function (softmax, sigmoid or relu) for prediction. In your code, the activation function is in the functions tf.nn.sparse_softmax_cross_entropy_with_logits or tf.nn.sigmoid_cross_entropy_with_logits.

losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits); losses = tf.nn.sigmoid_cross_entropy_with_logits(labels=self.input_y_multilabel, logits=self.logits)

brightmart / text_classification

question about the tf.layers.dense in CNN before logit #79

5. logits(use linear layer)and predictions(argmax)