-
First of all, thank you for the great work!
Is there any plan to support paged kv cache in non-contiguous memory? For instance, in flash_attn_with_kvcache?
-
**Describe the bug**
I am unable to profile my workload.
**Development Environment:**
- Linux Distribution: Docker Container running Ubuntu 22.04
- Omniperf Version: 2.0.1 (release)
- GPU: …
-
This basic subject has been variously discussed before –
https://github.com/bos/aeson/issues/227
https://github.com/bos/aeson/issues/181
The thing is that Aeson can't make a distinction between…
-
Hi, I encountered an out of workspace memory error when trying to load the gemma-2-27b model using vllm with the flashinfer backend, which seems to have come from flashinfer. I printed out the GPU mem…
-
### System Info / 系統信息
Cuda 12.3
python 3.11.5
centos 7
p40 显卡三张
### Who can help? / 谁可以帮助到您?
_No response_
### Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [ ] My own mo…
-
I'm keen on adding [speculative decoding](https://arxiv.org/abs/2211.17192) to outlines.
Is this something that is being worked on? Otherwise I would be happy to submit a PR but I'd need some advic…
-
### What happened?
Ever since I installed the new update, I can't right click on any pages in Waterfox!
### Reproducible?
- [ ] I have checked that this issue cannot be reproduced on Mozilla Firef…
-
Great work!
I tried your [example](https://github.com/SafeAILab/EAGLE#:~:text=llama%2D2%2Dchat%5D-,With%20Code,-You%20can%20use) for llama-7b-chat and changed the tree structure in choices.py into …
-
ipex-llm/python/dev/benchmark/all-in-one/run-srp.sh
Run Llama-2-7B-Chat-hf bigdl_ipex_bf16 Error on Intel(R) Xeon(R) w9-3475X
config.yaml
```
repo_id:
# - 'THUDM/chatglm2-6b'
- 'meta-l…
-
### Before submitting a bug report
- [X] I updated to the latest version of Multi-Account Container and tested if I can reproduce the issue
- [X] I searched for existing reports to see if it hasn't a…