bhoov / exformer

An Explorable Transformer. A lite version of exBERT
http://exbert.net/
Apache License 2.0
1 stars 1 forks source link

Fix faulty offset with `distilgpt2` (and other autoregressive models) #5

Closed bhoov closed 4 years ago

bhoov commented 4 years ago

distilgpt2 seems to occasionally offset the attention head boxes incorrectly:

image

Affected whenever "Hide Special Tokens" or layer change is called on an autoregressive model

bhoov commented 4 years ago

Fixed by 95fa2c8625712fce5ad2166766ed403bde40d136