Open etrigger opened 11 months ago
Yes, but more simpler.
You can just do this model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
. You don't need to consider past_key_values when implementing model (In most cases), can this mixin and filling_sequence
(autoregressive api) will save cache for it.
example see llama inference example
I did not find such a cached method using past_key_values in the SAT. Is it possible to add this? Thanks.