Open shcup opened 5 years ago
btw here is a visualization may help you understand different BERT layers: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best
because the first token is
[CLS]
which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning,[CLS]
aka the first token can be a meaningful representation of the whole sentence.If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular,
[CLS]
isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies
why you say after fine-tuning, [CLS] aka the first token represents the whole sentence? why can't represent before fine-tune
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?
BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"
python extract_features.py \ --input_file=./tmp.txt \ --output_file=./tmp.jsonl \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --layers=-1,-2,-3,-4 \ --max_seq_length=128 \ --batch_size=8
modify the BERT_BASE_DIR to your new model path
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text? I want sentence representation for my downstream tasks. Any idea on how to do this?
BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"
python extract_features.py --input_file=./tmp.txt --output_file=./tmp.jsonl --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --layers=-1,-2,-3,-4 --max_seq_length=128 --batch_size=8
modify the BERT_BASE_DIR to your new model path
Thanks a lot!!
Have you trained a model and got sentence representations? How good was the output? Because I have read that [CLS] token is better after fine tuning model.
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text? I want sentence representation for my downstream tasks. Any idea on how to do this?
BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12" python extract_features.py --input_file=./tmp.txt --output_file=./tmp.jsonl --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --layers=-1,-2,-3,-4 --max_seq_length=128 --batch_size=8 modify the BERT_BASE_DIR to your new model path
Thanks a lot!!
Have you trained a model and got sentence representations? How good was the output? Because I have read that [CLS] token is better after fine tuning model.
Not yet, but many people have used this as a basic step on their own work
Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Hey can you explain it a little more so as to how it is capturing the entire sentence's meaning. I fine tuned BERT uncased small model for text classification task,
I wanted to use the representation I have from the last layer of the [CLS] token to understand the False Positives. For instance, I thought if I could see the similar representations from the training set, will give me some insight of the wrong results. But the topk similar representations I get, are not really similar.
Everywhere it is mentioned that CLS token representation works for the fine tuned task. Works for my task, the accuracy is good. But while interpreting the similar sentences, the story is otherwise.
What do you think? Thanks in advance
because the first token is
[CLS]
which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning,[CLS]
aka the first token can be a meaningful representation of the whole sentence.If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular,
[CLS]
isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies