-
### Describe the issue
I have an LLM finetuned for a down-stream task using input-output pairs data(`X_train` - `Y_train`).
Now I plan to utilize llmlingua2 to compress `X_test` --> `X_test_compre…
-
Hi,
i try your example from the main page:
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
wget https://huggingface.co/neuralmagic/TinyLlama-1.1B-Chat-v…
-
I noticed that when I set CHAT_SEARCH_KWARG_K too high my embedding model cannot handle too many request, however I don't understand why this happen, as chunks are already embedded and question is sho…
-
Thank you very much for your work! I encountered a problem during the compression of 8*1b moe model and I wanna know if you encountered it on LLM and your solution. Any reply would be appreciated.
…
-
### Describe the issue
Thanks for the interesting work. I tried to reproduce the results of llmlingua on the meetingbank QA dataset with Mistral-7B as the target LLM.
The small LLM I use is https…
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expe…
-
Study SOTA approaches and modern papers:
1. [SmoothQuant](https://arxiv.org/pdf/2211.10438.pdf) [github](https://github.com/mit-han-lab/smoothquant)
2. [AWQ](https://arxiv.org/pdf/2306.00978.pdf) [gi…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
### System Info / 系統信息
Python: Python 3.10.14
os:
```
DISTRIB_ID=Kylin
DISTRIB_RELEASE=V10
DISTRIB_CODENAME=kylin
DISTRIB_DESCRIPTION="Kylin V10 SP1"
DISTRIB_KYLIN_RELEASE=V10
DISTRIB_VER…
-
### Motivation
For current large model inference, KV cache occupies a significant portion of GPU memory, so reducing the size of KV cache is an important direction for improvement. Recently, severa…