Open zikun-li opened 6 months ago
I remember that we cannot use vector here since this data struct will be sent to GPU. That's the reason we use preallocated arrays for everything. Correct me if I am wrong.
I see. Thanks for pointing this out! If you still remember it, could you please find the code to pass the data structure to GPU?
The data structure is automatically memcpy'd from CPU's address space to GPU's address space. Therefore, using any stl data structure is problematic since std::vector
is essentially a pointer to an array, and the memcpy only copies the pointer instead of the actual array from CPU's to GPU's address space. The issue is that you cannot access the vector's elements from a GPU thread.
You should replace vectors with arrays in the data structure
Okay, that make sense. It seems that if we preallocate the array, we have to set the size of the arrays in TreeSearchPerRequestInfo
to be equal to our budget, and do we have access to the function that passes this data structure to the GPU?
Does the following look good? Btw, which one is the budget, MAX_NUM_TOKENS
or MAX_SPEC_TREE_TOKEN_NUM
?
class TreeSearchBatchConfig : public BatchConfig {
public:
TreeSearchBatchConfig();
TreeSearchBatchConfig(int model_id);
TreeSearchBatchConfig(TreeSearchBatchConfig const &other, int model_id);
InferenceMode get_mode() const;
~TreeSearchBatchConfig();
friend std::ostream &operator<<(std::ostream &os,
TreeSearchBatchConfig const &bc);
void print() const;
void save_to_file(std::string const &filename) const;
int current_depth_all_requests() const;
int get_speculative_request_num() const;
// how many requests is in speculative phase
int speculative_request_num = 0;
int model_id;
struct TreeSearchPerRequestInfo {
int current_depth = -1;
BatchConfig::TokenId tokens_arr[BatchConfig::MAX_NUM_TOKENS];
float probs_arr[BatchConfig::MAX_NUM_TOKENS];
int parent_id_arr[BatchConfig::MAX_NUM_TOKENS];
};
TreeSearchPerRequestInfo tree_requests_info[MAX_NUM_REQUESTS];
};
MAX_NUM_TOKEN
is the max tokens in a batch, so I think the budget should be MAX_SPEC_TREE_TOKEN_NUM
. @xinhaoc Can you confirm with that? Thank you.
Thanks for the comment. Is MAX_SPEC_TREE_TOKEN_NUM
the budget for a single token or the whole batch?
It should be the budget for single request. The entire batch should be MAX_SPEC_TREE_TOKEN_NUM * MAX_NUM_REQUESTS
This issue is opened to discuss the plan to modify
BeamSearchBatchConfig
To support general tree structures. As this data structure is related to many on-going projects, we want to make sure that the change to it won't block others' development.The current
BeamSearchBatchConfig
class is as follows:Several redundant member variables and methods are listed as follows. Please take a look at them since the redundancy of some of them may need further verify:
beam_width
: we need to remove this since we want to support general tree structures.target_iterations
: not used.current_iteration
: not used.sub_requests
: not used.done
: not used.MAX_BEAM_WIDTH
: remove to support general tree structures.MAX_BEAM_DEPTH
: remove to support general tree structures.MAX_SPECULATIVE_TREE_BRANCHES
: remove to support general tree structures.BeamSearchPerRequestInfo::beam_size
,BeamSearchPerRequestInfo::max_depth
andBeamSearchPerRequestInfo::sub_request_num
: remove to support general tree structures.The proposed new data structure to substitute
BeamSearchBatchConfig
is as follows:All the arrays are replaced with
vector
, please let us know if that could cause a problem.