This PR adds two classes, fundamental for page attention in JetStream:
PageAttentionManager:
This class manages and frees page resources, calculates page metadata, and supports cache insertion.
PageKVCacheGenerate:
This class updates decode caches in a page-attention format. Unlike the standard LLM KV cache shape ([batch_size, num_heads, seq_len, head_dim]), PageKVCache uses the shape [num_heads, total_num_pages, page_size, head_dim].
This PR adds two classes, fundamental for page attention in JetStream:
PageAttentionManager:
This class manages and frees page resources, calculates page metadata, and supports cache insertion.
PageKVCacheGenerate:
This class updates decode caches in a page-attention format. Unlike the standard LLM KV cache shape ([batch_size, num_heads, seq_len, head_dim]), PageKVCache uses the shape [num_heads, total_num_pages, page_size, head_dim].