-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expe…
-
Recently, we see several awesome work focusing on kv cache compressing and they said can accelearte 1.7~2.3 times than FlashInfer, can you guys plz consider to surpport such features?
Same layer KV…
-
### Motivation
For current large model inference, KV cache occupies a significant portion of GPU memory, so reducing the size of KV cache is an important direction for improvement. Recently, severa…
-
Study SOTA approaches and modern papers:
1. [SmoothQuant](https://arxiv.org/pdf/2211.10438.pdf) [github](https://github.com/mit-han-lab/smoothquant)
2. [AWQ](https://arxiv.org/pdf/2306.00978.pdf) [gi…
-
### Is your feature request related to a problem? Please describe.
GroupChat uses a nested conversation between two agents. Currently it does not utilise the recent TransformMessages capability nor…
-
### Describe the bug
When using Langchain ContextualCompressionRetriever, "run not found" was raised.
```
Traceback (most recent call last):
File "/lib/python3.11/site-packages/langfuse/cal…
-
## List
- tutorials
- [ ] #4 - @seochan99
- [ ] #5 - @seochan99
- [ ] #6 - @seochan99
- [ ] #17 - @bananana0118
- [ ] graph.mdx
- [ ] index.mdx
- [ ] llm_chain.mdx
- [ ]…
-
Hello,
first and foremost, I want to thank you for your incredible work!
I'd like further information on how to reproduce your code. I followed the code instructions in your README, but I am unabl…
-
Sorry to raise the problem but give no systematic analysis
It may be about to take me more time on more complete investigation over the "compression" ability of LLM as many may be support "compressio…
-
### Describe the issue
First of all, thank you for your great contributions.
I have a similar question to the [issue 146](https://github.com/microsoft/LLMLingua/issues/146), I cannot reproduce the…