4bit Search Results - Githubissues

1000+ results
for 4bit

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2330

Why AWQ 4bit takes more ram than we expected?

I have a model gemma 2 9B. I quantized this with AWQ-4bit. Size of model is 5.9GB. I set the kv_cache_free_gpu_mem_fraction to 0.01 and run triton on one A100. But triton takes 10748MiB of ram. I expe…

Alireza3242 updated 4 weeks ago
4
unslothai/unsloth #1093

Lora adapter is almost as large as model

```py from unsloth import FastLanguageModel from unsloth import is_bfloat16_supported import torch from unsloth.chat_templates import get_chat_template from trl import SFTTrainer from transform…

kirawi updated 1 month ago
5
StartHua/Comfyui_CXH_joy_caption #71

batch image interrogate error

WARNING: LoadImageBatch.IS_CHANGED() got an unexpected keyword argument 'node_id' D:\ComfyUI_windows\ComfyUI\models\clip\siglip-so400m-patch14-384 D:\ComfyUI_windows\ComfyUI\models\LLM\Meta-Llama-3.1-…

hopeyan476868 updated 13 hours ago
1
unslothai/unsloth #1224

OSError: could not get source code when loading a model usin…

An error occurs when loading a model using a for loop as shown below. What could be the problem? ```py for peft_model_id in peft_model_ids: print(peft_model_id) model, tokenizer =…

daegonYu updated 2 weeks ago
4
mit-han-lab/smoothquant #29

4bit weight quantization? 4bit activation quantization?

Did you run experiments with 4bit weight quantization? And/or did you try 4bit activation quantization? If so would be curious about the results, if not, why not?

Thomas-MMJ updated 1 year ago
1
xai-org/grok-1 #15

4bit quantization

Hi, Thanks for releasing Grok! Is there any chance we could load the model in 4-bit given how large it is? Do you know if bitsandbytes support is planned (cc @timdettmers)? Thanks!

fakerybakery updated 7 months ago
3
unslothai/unsloth #785

Gemma2 fails saving as GGUF

@danielhanchen Hi Daniel, thanks for your work! having an error just like in the issue #275, but this time while trying to save tuned version of unsloth/gemma-2-9b-it-bnb-4bit. >> model.save_p…

kupletist updated 3 weeks ago
25
Blaizzy/mlx-vlm #103

gradio chat exception

hi all, I had the following exception when trying to run the gradio example with: `python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-72B-Instruct-4bit` ``` .../mlx_vlm/chat_ui.py", line …

faev999 updated 3 weeks ago
3
artidoro/qlora #29

Cannot merge LORA layers when the model is loaded in 8-bit m…

When I load the model as following, throw the error: Cannot merge LORA layers when the model is loaded in 8-bit mode How can I load model with 4bit when inferencing? ` model_path = 'decapoda-resea…

yangjianxin1 updated 2 weeks ago
27
oneapi-src/oneDNN #2081

[ARM] Support 8bit/4bit weights decompression for Matmul pri…

# Problem statement LLM workloads oriented on best latency are memory bound. Inference speed is limited by model weights access through DDR. That’s why major optimization technique is weights compres…

dmitry-gorokhov updated 1 month ago
4

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for 4bit

1000+ results
for 4bit