Closed rzsgrt closed 4 years ago
Hi @rezasugiarto : It is the same as the common way of how BERT handle pair sequence input data basically it is done by the following format:
[CLS]<Text_1>[SEP]<Text_2>[SEP]
Where [CLS]
denotes classification token, [SEP]
denotes separator token, and <Text_1>
and <Text_2>
denote the pair text.
Following the original BERT model, we also add different token_type
embedding for <Text_1>
and <Text_2>
.
For simplicity, in order to create the aformentioned format and token type ids, we can can be use BertTokenizer.encode_plus()
function as shown on:
https://github.com/indobenchmark/indonlu/blob/a698339222a5c214e6a693e81f8c66785cf35477/utils/data_utils.py#L463
For further detail regarding to the BERT model you can read the original BERT paper on: https://arxiv.org/abs/1810.04805
Hi @rezasugiarto : It is the same as the common way of how BERT handle pair sequence input data basically it is done by the following format:
[CLS]<Text_1>[SEP]<Text_2>[SEP]
Where[CLS]
denotes classification token,[SEP]
denotes separator token, and<Text_1>
and<Text_2>
denote the pair text.Following the original BERT model, we also add different
token_type
embedding for<Text_1>
and<Text_2>
.For simplicity, in order to create the aformentioned format and token type ids, we can can be use
BertTokenizer.encode_plus()
function as shown on: https://github.com/indobenchmark/indonlu/blob/a698339222a5c214e6a693e81f8c66785cf35477/utils/data_utils.py#L463For further detail regarding to the BERT model you can read the original BERT paper on: https://arxiv.org/abs/1810.04805
Clear enough, thank you
Hi, thanks for publishing this work. Specific to text entailment task, how to use your model for this task since we need to feed two sentence?