4bit Search Results - Githubissues

1000+ results
for 4bit

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

lmstudio-ai/mlx-engine #27

Repeated Generation Regression with Qwen2-VL-7B-Instruct-4bi…

### tl;dr There has been a change in behavior between [`main` (e51ede12e1b639fd30c8797eb3bbd8b9fb3de826)](https://github.com/lmstudio-ai/mlx-engine/commit/e51ede12e1b639fd30c8797eb3bbd8b9fb3de826) an…

mattjcly updated 6 days ago
5
unslothai/unsloth #1268

cannot load some models via vllm

here is the summary: `unsloth/mistral-7b-v0.3-bnb-4bit` with error : ` KeyError: 'layers.0.mlp.down_proj.weight'` `unsloth/Qwen2.5-7B-Instruct-bnb-4bit` with error: `KeyError: 'layers.0.mlp.down_pro…

yananchen1989 updated 5 days ago
11
unslothai/unsloth #1031

Any Plans To Support Solar Pro?

[https://huggingface.co/upstage/solar-pro-preview-instruct](https://huggingface.co/upstage/solar-pro-preview-instruct) Solar released a new 22b model, and this thing is crazy powerful. I was just won…

DaddyCodesAlot updated 2 months ago
6
unslothai/unsloth #923

beam search does not work for gemma2b

Env: torch2.4 cuda 12.4 unsloth main below is the code errored ``` from unsloth import FastLanguageModel import torch model_id="unsloth/gemma-2-2b-it-bnb-4bit" model, tokenizer = FastLanguageM…

world2vec updated 3 weeks ago
5
rhymes-ai/Aria #32

How to quantize the model?

Currently having issues attempting to quantize, save, then load the model using HF Transformers. Is there any known working method for quantizing Aria (preferably to 4bit)?

iamthemulti updated 1 week ago
7
AnswerDotAI/gpu.cpp #5

Support for 4bit or 8bit tensor

Although it may be out of scope, it would be nice to have an example of computing 4bit and 8bit tensors, to save memory bandwidth.

junjihashimoto updated 4 months ago
2
unslothai/unsloth #986

Gemma batch inference much slower than Mistral

Hi. Raising this issue as I am experimenting a much slower inference time with Gemma-1 models. > Environment: > - xformers 0.0.26.post1 pypi_0 pypi > - unsloth …

lctdulac updated 1 month ago
5
artidoro/qlora #278

Saving/Loading qlora adapters

Hello, my situation is as follows: I implemented qlora adapter to use with LLMs (currently bloom-560m). It works fine so far, after fine-tuning I get over 90% accuracy on my task. However, after sa…

chrisi2045 updated 1 week ago
1
openvla/openvla #145

int4 and int8 quantization

From the way it is written in the paper, int4 and int8 quantizations are supported. But how do I set them? Reading on another [issue](https://github.com/openvla/openvla/issues/10), I should set the c…

ClousTom updated 1 day ago
2
Lightning-AI/lightning-thunder #1111

quantization: process tensors on meta device directly, maybe…

Currently the BitsAndBytesLinearQuant4bit for submodule always calls `bitsandbytes.functional.quantize_4bit`. This is somewhat touchy for CPU tensors because `quantize_4bit` only works on GPU tensors …

t-vi updated 2 months ago
4

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for 4bit

1000+ results
for 4bit