Currently the implementation of cascade inference is two level, composed of SinglePrefillWithKVCache(or SingleDecodeWithKVCache) and BatchPrefillWithPagedKVCacheWrapper (or BatchDecodeWithPagedKVCacheWrapper). I wonder if the batch prefill or decode attention without paged attention can be used in cascade inference?
Currently the implementation of cascade inference is two level, composed of SinglePrefillWithKVCache(or SingleDecodeWithKVCache) and BatchPrefillWithPagedKVCacheWrapper (or BatchDecodeWithPagedKVCacheWrapper). I wonder if the batch prefill or decode attention without paged attention can be used in cascade inference?