-
**Describe the bug**
After a model is generated running `big_model_fp8.py`, lm_eval dont not work unless the .py files from the original base model is transferred to the generated model folder. Happe…
-
When running the provided example code for the TTT (Learning to Learn at Test Time) model, the output generated by the model is not coherent or meaningful. The expected output for the prompt "Greeting…
pprp updated
3 months ago
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain.js documentation with the integrated search.
- [X] I used the GitHub search to find a …
-
During the evaluation process, I noticed that the models have different lengths, such as qwen-128, and the length is measured using the tokenizer of gpt-3.5-turbo. For qwen 128k, should it be set as 1…
-
### Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
- [X] I have checked the existing iss…
-
- [X] I have checked that a similar [feature request](https://github.com/Genymobile/scrcpy/issues?q=is%3Aopen+is%3Aissue+label%3A%22feature+request%22) does not already exist.
I know this kind of …
-
### System Info / 系統信息
在win11系统下,python版本3.10.9,cuda版本12.1,
transformers 4.41.0
xinference 0.13.1
torch 2.3.1+cu121
torchaudio …
-
**Describe the bug**
Hello the vLLM team, thank you for your outstanding work. I think llm-compressor is really filling a need : a one simple unified quant franework for vLLM.
So the bug I am enc…
-
What I want to achieve basically is re-ranking and prompt compression, before adding the retrieved docs to the context.
I read that it could drastically improve RAG performance. I think right now t…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain.js documentation with the integrated search.
- [X] I used the GitHub search to find a …