Open wutong4012 opened 4 weeks ago
Hi! Thank you for your attention to our work.
We estimated the time cost of each component during LLM inference based on LLM-Viewer, with some modifications to improve the accuracy of our estimations. We apologize for not citing this in the paper and will update the reference accordingly.
Thanks for the reference.
Please forgive my ignorance, but I didn't see the time calculation for different parts in LLM-Viewer.
If it's convenient, can you provide more specific details? For example:
time
package used to calculate the time?Thank you again for your open source contribution.
Thank you very much for your interesting work and contribution to the open source community.
I would like to ask, how is the KV loading time calculated in your paper? How is it strictly distinguished from other parts?