Closed bhoov closed 4 years ago
distilgpt2 seems to occasionally offset the attention head boxes incorrectly:
Affected whenever "Hide Special Tokens" or layer change is called on an autoregressive model
Fixed by 95fa2c8625712fce5ad2166766ed403bde40d136
distilgpt2 seems to occasionally offset the attention head boxes incorrectly:
Affected whenever "Hide Special Tokens" or layer change is called on an autoregressive model