fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
https://gitee.com/fastnlp/fastNLP
Apache License 2.0
3.06k stars 450 forks source link

bert embedding默认以全零作为token type,对于QA等任务是否会修改为第一句为0,第二句为1的编码? #297

Closed onebula closed 4 years ago

onebula commented 4 years ago

https://github.com/fastnlp/fastNLP/blob/4e95989e973f59b2ecb7f718647257e8b6fea0c7/fastNLP/modules/encoder/bert.py#L240

xuyige commented 4 years ago

对于每一个输入文本,如果其中有[SEP]标识符,则会根据[SEP]将token_type_ids修改为0101交替,如果没有[SEP]标识符,则会默认采用全0的句子编码

onebula commented 4 years ago

谢谢,可以指明一下在代码里哪一部分嘛?

xuyige commented 4 years ago

谢谢,可以指明一下在代码里哪一部分嘛?

https://github.com/fastnlp/fastNLP/blob/master/fastNLP/embeddings/bert_embedding.py#L433

onebula commented 4 years ago

thx