Question regarding the performance model

Thanks for the great work. Regarding the performance model, linear predictors are utilized to predict the latency of the prompt and token phases. When I looked into the implementation of the token time predictor, I found that the batch_tokens variable is calculated as the number of tasks in the batch (https://github.com/Mutinifni/splitwise-sim/blob/8f99e7dc9b407f4ce2488d03dd44c0b8b946dab0/performance_model.py#L230), which is generally a small number from 1 to dozens. However, the token time predictor is built based on the prompt size, which ranges from 128 to 32768 (https://github.com/Mutinifni/splitwise-sim/blob/8f99e7dc9b407f4ce2488d03dd44c0b8b946dab0/performance_model.py#L117). Therefore, there is a mismatch between the range of the key used to build and predict the token time. However, the token time can be closely related to the KV cache size, so I guess we should also use batched prompt size to predict the token time, similar to how it's done for prompt time prediction (https://github.com/Mutinifni/splitwise-sim/blob/8f99e7dc9b407f4ce2488d03dd44c0b8b946dab0/performance_model.py#L227). Could you confirm if my understanding is correct? Thanks.

Mutinifni / splitwise-sim