llm-pruning Search Results

neuralmagic/sparseml #2357

Default quantization- True or false in SparseGPT

Hi! in the recipe, if i do not want to quantize and perform structured pruning, is it okk to give quantize:false like below and do not provide QuantizationModifier in the recipe? SparseGPTModif…

sriyachakravarthy updated 9 hours ago

huggingface/transformers #33225

prune_heads() method for AutoModelForCausalLM

### Feature request We have `prune_heads()` method for `AutoModel` class, but not for `AutoModelForCausalLM`. Please provide `prune_heads()` method to `AutoModelForCausalLM` class. ### Motivation M…

mnauf updated 3 days ago

InternLM/lmdeploy #2517

[Feature] Support Llama 3.2 family of models

### Motivation ## Motivation for Implementing Llama 3.2 lmdeploy Inference Engine Support Llama 3.2's release presents a strong case for expanding lmdeploy with dedicated support: * **Multi-mod…

vikrantrathore updated 1 week ago

xorbitsai/inference #2372

[Feature] Support for Llama 3.2 Multi-modal and Lightweight …

### Feature request / 功能建议 This feature request proposes adding support for Meta's newly released Llama 3.2 models to lmdeploy. Llama 3.2 introduces exciting capabilities, including vision LLMs (11…

vikrantrathore updated 10 hours ago

NVIDIA/TensorRT-Model-Optimizer #67

Quantized Model Runs Very Slow (Unable to load extension mod…

I have tried to quantize a model by following the guide ([PyTorch Quantization — Model Optimizer 0.15.0](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html)), and I ca…

relaxtheo updated 1 week ago

griptape-ai/griptape #1217

`SummaryConversationMemory` maintaining entire thread of mem…

- [x] I have read and agree to the [contributing guidelines](https://github.com/griptape-ai/griptape#contributing). **Describe the bug** It appears like SummaryConversationMemory is maintaining t…

shhlife updated 1 day ago

google-research/jaxpruner #9

Request for Optimal Brain Surgeon -- SparseGPT

Hi and thanks for the amazing repo. I have a bit of tall request. SparseGPT uses a per-layer optimal brain surgeon approach to pruning. Here is the [pytorch code](https://github.com/IST-DASLab/spar…

opooladz updated 1 month ago

pytorch/ao #633

[RFC] Add LayerSkip to AO

Tracker issue for adding [LayerSkip](https://arxiv.org/abs/2404.16710) to AO. This is a training and inference optimization that is similar to layer-wise pruning. It's particularly interesting for…

jcaip updated 1 month ago

horseee/LLM-Pruner #21

I encountered the following error message when I assign iter…

Traceback (most recent call last): File "/home/jovyan/honor/yangdong/LLM-Pruner-main/examples/baichuan.py", line 342, in main(args) File "/home/jovyan/honor/yangdong/LLM-Pruner-main/exam…

yangd85 updated 1 year ago

vllm-project/vllm #8654

[Bug]: RuntimeError in gptq_marlin_24_gemm

### Your current environment python 3.8 L20*4 vllm 0.5.4 ### Model Input Dumps _No response_ ### 🐛 Describe the bug $python -m vllm.entrypoints.api_server --model='/mntfn/yanyi/Qwen2-…

leoyuppieqnew updated 1 week ago

191 results for llm-pruning

191 results
for llm-pruning