Open dutsc opened 9 months ago
I checked the traverse_verify_tree() function under the /FlexFlow/src/runtime/request_manager.cc path and found that it only verifies whether the tokens are equal. Does this mean that the default implementation of specinfer is the VERIFYGREEDY function of Algorithm 2 in the paper?
Algorithm 2 in the paper:
The meaning of this pseudocode is to find a path from the root to the leaf node in the token tree so that its token has the same result as the verify model.
traverse_verify_tree() code snippet:
for (int i = 0; i < outputSerializedTree.size(); i++) {
auto input = inputSerializedTree.at(i);
auto output = outputSerializedTree.at(i);
if (i == 0) {
verifiedTree.push_back(output);
new_committed_tokens.push_back(std::make_pair(
input.second,
committed_tokens.at(guid).at(i).second)); // <input_abs_depth,
// input_index_in_batch>
// std::cout << committed_tokens.at(guid).at(i).first << ", "
// << committed_tokens.at(guid).at(i).second << std::endl;
// std::cout << input.first << ", " << input.second << std::endl;
assert(committed_tokens.at(guid).at(i).first == input.second);
continue;
}
if (input.first == verifiedTree.back().first &&
input.second == verifiedTree.back().second) { // input == verifiedTree.back()
verifiedTree.push_back(output);
new_committed_tokens.push_back(std::make_pair(
input.second,
committed_tokens.at(guid).at(i).second)); // <input_abs_depth,
// input_index_in_batch>
assert(committed_tokens.at(guid).at(i).first == input.second);
}
}
traverse_verify_tree() only has about 100 lines in total. Except for the content in the picture, it is basically printing the log.
The current implementation performs greedy decoding, and we are working on a PR for multi-step stochastic sampling and verification. Are the incorrect outputs generated using greedy decoding or stochastic?
The incorrect outputs are generated with stochastic decoding according to Algorithm 2 in SpecInfer paper. When I use greedy verify from Algorithm 2, the same prompt produces the same result.
my prompt:please introduce Kobe Bryant, who played basketball in NBA.
SpecInfer outputs:
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or
my implementation greedy verify outputs:
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or not, but Kobe Bryant is a basketball player.
I'm not sure if you're being sarcastic or
I'm having trouble learning the SpecInfer source code.
The pseudo code of the algorithm in the SpecInfer paper about verify model verifying the output of draft model is as follows:
I implemented this pseudocode using python, but the output I got was not normal and it didn't seem to be the correct answer.
I set max_length=100, draft model inference step lookahead=4, verify model uses opt-6.7b, and draft model uses opt-125m.
I hope to solve the problem by referring to the SpecInfer source code, but this is very difficult for me. I guess that the part where the verify model verifies the output of the draft model is in the prepare_next_batch_init function and the traverse_verify_tree function in the request_manager.cc file, but I can't quite understand the contents.
Here is the above pseudocode implemented in python:
Here is a description about TreeNode:
I hope someone can help me.