Description problem of attention_mask.

google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Apache License 2.0

3.23k stars 570 forks source link

Open liushaoweihua opened 4 years ago

liushaoweihua commented 4 years ago

There may be a problem with the shape description of attention_mask in L857~L860, which should be [batch_size, from_seq_length].