-
Hi! in the recipe, if i do not want to quantize and perform structured pruning, is it okk to give quantize:false like below and do not provide QuantizationModifier in the recipe?
SparseGPTModif…
-
### Feature request
We have `prune_heads()` method for `AutoModel` class, but not for `AutoModelForCausalLM`. Please provide `prune_heads()` method to `AutoModelForCausalLM` class.
### Motivation
M…
-
### Motivation
## Motivation for Implementing Llama 3.2 lmdeploy Inference Engine Support
Llama 3.2's release presents a strong case for expanding lmdeploy with dedicated support:
* **Multi-mod…
-
### Feature request / 功能建议
This feature request proposes adding support for Meta's newly released Llama 3.2 models to lmdeploy. Llama 3.2 introduces exciting capabilities, including vision LLMs (11…
-
I have tried to quantize a model by following the guide ([PyTorch Quantization — Model Optimizer 0.15.0](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html)), and I ca…
-
- [x] I have read and agree to the [contributing guidelines](https://github.com/griptape-ai/griptape#contributing).
**Describe the bug**
It appears like SummaryConversationMemory is maintaining t…
-
Hi and thanks for the amazing repo.
I have a bit of tall request. SparseGPT uses a per-layer optimal brain surgeon approach to pruning. Here is the [pytorch code](https://github.com/IST-DASLab/spar…
-
Tracker issue for adding [LayerSkip](https://arxiv.org/abs/2404.16710) to AO.
This is a training and inference optimization that is similar to layer-wise pruning. It's particularly interesting for…
jcaip updated
1 month ago
-
Traceback (most recent call last):
File "/home/jovyan/honor/yangdong/LLM-Pruner-main/examples/baichuan.py", line 342, in
main(args)
File "/home/jovyan/honor/yangdong/LLM-Pruner-main/exam…
-
### Your current environment
python 3.8
L20*4
vllm 0.5.4
### Model Input Dumps
_No response_
### 🐛 Describe the bug
$python -m vllm.entrypoints.api_server --model='/mntfn/yanyi/Qwen2-…