Closed shifeiwen closed 9 months ago
in the function that calculates whether to match, why is the value of random.random used to determine whether the match is successful?
Thank you for your interest.
Which means that the length of seq_len in the decode layer is 26 each time.
26 refers to the number of nodes in the generation tree. We haven’t fine-tuned the structure and length of the generation tree; it's based more on intuition: branches for high-probability tokens should be deeper. Using a smaller tree might be more effective on edge devices.
I Wondering if I could reduce the length of some candidate words, or set this parameter as a hyperparameter.
You can reduce the length of some candidate words. The structure of the tree (which determines seq_len) is a hyperparameter, and you can easily adjust it. This hyperparameter is located in model/choices.py. The list mc_sim_7b_63 represents the structure of the tree, where each sublist corresponds to a node in the tree. For example, considering the query "I", [0] corresponds to the most probable token "am" following “I”, and [0,1] corresponds to the second most probable token "grateful" following "I am".
In the function that calculates whether to match, why is the value of random.random used to determine whether the match is successful?
You can find the pseudocode corresponding to this part of the code in our blog. It actually involves recursively using speculative sampling. The guarantee of speculative sampling being distribution invariant can be found in Appendix A of paper.
If you have any further questions, please feel free to continue commenting.
Thanks for your reply, I will do some experiments
Thank you for your excellent work. I have read your code and have some questions about candidate tokens. In the code, I saw that the number of nodes in the designed tree is 26, which means that the length of seq_len in the decode layer is 26 each time. I don’t know if this 26 has any special meaning or is the best result after an experiment, because on some edge devices or when the computing power is insufficient, the long seq_len length and the low hit rate will cause performance degradation, so I Wondering if I could reduce the length of some candidate words, or set this parameter as a hyperparameter. Hope I can get some of your thoughts. @Liyuhui-12