Open shirohasuki opened 2 months ago
Global buffer is actually L2 cache in most of the cases. For these buffers, LLMCompass assumes only the latency of the first memory access is exposed. The following latency will be well hidden. What matters is bandwidth.
Hi, very AMAZING work! I like this. Thank you for your contributions.
There is one thing puzzling me. I've seen that both computation and io_module have latency in code, but I don't find latency of the memory model——like we getting data from the loacl buffer(SRAM) or global buffer(HBM) (not latency on link).
If you've already mentioned in your paper that I apologize for my omission.
Thanks!