-
### Describe the issue
Hi
I come from https://github.com/vllm-project/vllm/issues/6701.
I am wondering when will the 2.3.110 IPEX be released.
-
### 📚 The doc issue
I don't think it's possible to get the structure of the dataset as depicted below in the diagram as shown in the diagram.
### Suggest a potential alternative/fix
I don't k…
-
Thank you for your solid work. I would like to ask if the current version is suitable for GQA architecture models, such as LLaMA-2-70B and LLaMA-3.
-
Great job!
We found that Quest is implemented on the previous version of flashinfer and some common feature are not support currently.
* bsz > 1
* GQA
* CUDA graph
Is there any plan to update t…
-
Great job!
I found that the checkpoint link about GQA is not working, e.g. "GQA-SGCls-1 checkpoint".
Could you please re-upload the "GQA-SGCls-1 checkpoint"?
Thanks again for the work you do!
-
Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.
-
Hi!, I'm trying to replicate your implementation with Llama 2-13B and 7B, but curiously the runtimes didn't make sense (llama 2 gqa > llama 2 WITHOUT gqa) there is a little difference between my code …
-
I cannot find any files in the GQA dataset split link and GQA pretrained object detector OneDrive link.
Can you check please? Thank you.
-
```
|-- gqa-inpaint
| |-- images
| |-- images_inpainted
| |-- masks
| |-- train_scenes.json
| `-- meta_info.json
`-- MagicBrush
|-- data
…
-
In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with `num_key_value_heads * num_key_value_groups` heads. Indeed in kv cache eviction, the choice might be diff…