During BERT decoding, past_key_values is used to accelerate calculation. Do we have a similar implementation?

THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

https://THUDM.github.io/SwissArmyTransformer

Apache License 2.0

871 stars 84 forks source link

During BERT decoding, past_key_values is used to accelerate calculation. Do we have a similar implementation? #129

Open etrigger opened 11 months ago

etrigger commented 11 months ago

I did not find such a cached method using past_key_values in the SAT. Is it possible to add this? Thanks.

Sleepychord commented 11 months ago

Yes, but more simpler. You can just do this model.add_mixin('auto-regressive', CachedAutoregressiveMixin()). You don't need to consider past_key_values when implementing model (In most cases), can this mixin and filling_sequence (autoregressive api) will save cache for it.

example see llama inference example