dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT
Apache License 2.0
591 stars 179 forks source link

Padding bugs on data preprocess #10

Closed AppleHolic closed 5 years ago

AppleHolic commented 5 years ago

On this code line, the pad index 0 is same with first segment index. So, it may not offer segment information exactly.

dhlee347 commented 5 years ago

I merged your pull requests.

yeachan-kr commented 5 years ago

The pad index 0 does not affect to the accuracy since the paddings will be masked. Further, such requests cause the out-of-bound problem when loading the pre-trained models.