FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024
Apache License 2.0
166 stars 10 forks source link

Questions about past_key_values_data #6

Closed reflectionie closed 6 months ago

reflectionie commented 6 months ago

Hi, thank you for your wonderful work! Can I ask, why did you update the past_key_values_data variable in the code? The link is here: https://github.com/FasterDecoding/REST/blob/5b119c1d2c318549bf6ef45aaba126b9e7d59529/rest/model/utils.py#L325 This variable doesn't seem to be involved when using KVcache, but you still create and update it when decoding.

zhenyuhe00 commented 6 months ago

Thanks for the question.
https://github.com/FasterDecoding/REST/blob/5b119c1d2c318549bf6ef45aaba126b9e7d59529/rest/model/utils.py#L325-L329 In the code above, tgt is used to update dst. Since dst is a slice view of the original past_key_values_data, the past_key_values_data is also updated.

https://github.com/FasterDecoding/REST/blob/5b119c1d2c318549bf6ef45aaba126b9e7d59529/rest/model/kv_cache.py#L92-L100 The past_key_values_data is created in advance to accommodate the max sequence length for faster memory management (static cache). However, you can also change it to dynamic cache, which appends new pask_key_value on the fly.

reflectionie commented 6 months ago

Thanks for the explanation, I totally understand now!