-
### Bug description
When user is importing a model and starting an inference server for it, the entire user's folder where the model is located is getting mounted in the container which causes perfor…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
I found your work incredibly interesting and insightful. Thank you for the great work!
I wanted to share [NaturalBench](https://arxiv.org/abs/2410.14669) (NeurIPS'24 D&B), a collaborative project b…
-
### Proposal to improve performance
_No response_
### Report of performance regression
_No response_
### Misc discussion on performance
Question:
When deploying LoRA with vLLM, suppose I have …
-
### Proposal to improve performance
I am trying to run phi3.5 vision instruct model with around 10k prompts. What I noticed with the increase in prompts my CPU RAM consumption keeps increasing and ev…
-
# Metrics
Precision and recall were calculated to evaluate the performance of each model and compare the performance of various models. Confusion matrix plotted for better understanding.
## Precis…
-
Hi @jacobbieker ,
Happy new year!:)
I was wondering if you ever measured the performance of your models with this code. Like is it similar to Keisler etc?
I saw there are some pretrained weights…
-
### What happened?
Hi,
When I use llama.cpp to deploy a pruned llama3.1-8b model, a unbearable performance degration appears:
We useing a structed pruning method(LLM-Pruner) to prune llama3.1-8b, w…
-
### Feature request
This request aims to introduce functionality to delete specific adapter layers integrated with PEFT (Parameter-Efficient Fine-Tuning) within the Hugging Face Transformers librar…
-
![image](https://github.com/user-attachments/assets/107e5738-8b12-42a1-8229-d33c1f35dc3d)
There are no Official Quantized Model So Will Team Support Quantized Version Official or Not ? As you can s…