google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.15k stars 9.6k forks source link

why the pooled_output just use first token to represent the whole sentence? #196

Open shcup opened 5 years ago

hanxiao commented 5 years ago

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

hanxiao commented 5 years ago

btw here is a visualization may help you understand different BERT layers: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best

hitxujian commented 5 years ago

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

why you say after fine-tuning, [CLS] aka the first token represents the whole sentence? why can't represent before fine-tune

Traeyee commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

KavyaGujjala commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

Traeyee commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py \ --input_file=./tmp.txt \ --output_file=./tmp.jsonl \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --layers=-1,-2,-3,-4 \ --max_seq_length=128 \ --batch_size=8

modify the BERT_BASE_DIR to your new model path

KavyaGujjala commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text? I want sentence representation for my downstream tasks. Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py --input_file=./tmp.txt --output_file=./tmp.jsonl --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --layers=-1,-2,-3,-4 --max_seq_length=128 --batch_size=8

modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations? How good was the output? Because I have read that [CLS] token is better after fine tuning model.

Traeyee commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi, How to get that [CLS] representation after using run_pretraining.py code for domain specific text? I want sentence representation for my downstream tasks. Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12" python extract_features.py --input_file=./tmp.txt --output_file=./tmp.jsonl --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --layers=-1,-2,-3,-4 --max_seq_length=128 --batch_size=8 modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations? How good was the output? Because I have read that [CLS] token is better after fine tuning model.

Not yet, but many people have used this as a basic step on their own work

chikubee commented 5 years ago

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hey can you explain it a little more so as to how it is capturing the entire sentence's meaning. I fine tuned BERT uncased small model for text classification task,

I wanted to use the representation I have from the last layer of the [CLS] token to understand the False Positives. For instance, I thought if I could see the similar representations from the training set, will give me some insight of the wrong results. But the topk similar representations I get, are not really similar.

Everywhere it is mentioned that CLS token representation works for the fine tuned task. Works for my task, the accuracy is good. But while interpreting the similar sentences, the story is otherwise.

What do you think? Thanks in advance