Open rxy1212 opened 5 years ago
Between the bidirectional tranformer of BERT and the output, there is a classification layer. You need this layer for fine-tuning, because the output should be a class, not embeddings (the transformer outputs embeddings).
BERT-as-service gives you a sentence embedding, so of course they removed the classification layer in order to retrieve the embeddings. That's why you have a 768-dimensions vector : that is the sentence embedding.
when the fune-tuning is done the classifier will be discard?
Yes.
So we must train another classifier to do our own task like sentence similarity?
Yes, because what BERT-as-service gives you is an embedding. From this, you need to add a classification layer to fit your data.
Between the bidirectional tranformer of BERT and the output, there is a classification layer. You need this layer for fine-tuning, because the output should be a class, not embeddings (the transformer outputs embeddings).
BERT-as-service gives you a sentence embedding, so of course they removed the classification layer in order to retrieve the embeddings. That's why you have a 768-dimensions vector : that is the sentence embedding.
when the fune-tuning is done the classifier will be discard?
Yes.
So we must train another classifier to do our own task like sentence similarity?
Yes, because what BERT-as-service gives you is an embedding. From this, you need to add a classification layer to fit your data.
That's make sense, thank you!
can you please let me know the steps followed/ parameters sent for fine tuning. Should i have to just call the run_classifier's create_model() or just call run_classifier
Hello, I saw the cross-entropy loss way when you get 2 variables of log_probs, one_hot_labels, then per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) But can I ask that how to change it to hinge loss ?
I just fine-tuning bert with a classification task, and i noticed that a classifier is append after bert's output when fine-tuning.
create_model(...) in run_classifier.py
The fine-tuned model can predict my test set and the result is like "probability of 0, probability of 1"(a sentence similarity task). But what confused me is when I feed the fine-tuned model a sentence(not run run_classifier.py but run the model by bert-as-service
https://github.com/hanxiao/bert-as-service
i got a vector of 768. It's that mean we just use the classifier to fune-tuning bert, but when the fune-tuning is done the classifier will be discard? So we must train another classifier to do our own task like sentence similarity?