alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International
283 stars 18 forks source link

Consultation on Trie Tree Maintenance? #15

Closed ZipECHO closed 7 months ago

ZipECHO commented 8 months ago

Hi, I have noticed that there are also two mode(input and output) in Tree class. Could you please explain why these two modes need to be set, and what operations they correspond to on the Trie tree?

chenliangjyj commented 8 months ago

We will release the nodes with the mode tag input after the generation of the current query. We assume that the benefit of the current query's prompt to the next query is less than the gradually increasing time cost it incurs.

ZipECHO commented 8 months ago

Thank you for your reply. Do you means that you add nodes of the current query prompt before inference in add? Then you remove these node in remove1 or remove2 after the inference done. I am not sure which one corresponds to the release process. Beside, could you explain the function of stream_put of lookahead_cache? Thanks!

zheyishine commented 8 months ago

Q1: Yes. Q2: We remove the nodes of prompts here and here.
Q3: stream_put is used for putting generated tokens into lookahead_cache as soon as possible rather than the final step(i.e., the put function). We use a buffer in stream_put to accumulate tokens to the length of decoding_length to avoid breaking a branch. stream_put is better than put when a response contains repeated token pieces.

ZipECHO commented 8 months ago

Thank you very much~, I am clearly understand this part now.

ZipECHO commented 7 months ago

Hi guys, I have another questions about Trie tree maintenance.

  1. Why there are two kind of trees _update_trees and _update_input_trees,and what are the funtions difference between them?

  2. I guess you add prompts into _update_input_trees and release it when finished an inference of a prompt. And the inference results will be add into _update_trees and this set will be squeezed and released after its length exceed 1024 in here, do I understand correctly?

  3. Besides, will these release actions will affect (remove nodes or update freqs) the mem? or just affect _update_trees and _update_input_trees?

Thank you very much!

zheyishine commented 7 months ago

Be free to ask any quetions.

  1. _update_trees is used to squeeze overfat trees, while _update_input_trees is used to reset frequency of input(prompt) tokens.
  2. Partly correct. Both the prompts and inference results will be added into _update_trees. We squeeze trees until the size of 1024 for better performance, as a tree updated by several times under the size 1024 will only squeeze and count by one time.
  3. These actions will affect the mem.
ZipECHO commented 7 months ago

Thanks~