-
W = W.t().to(dtype)
else:
W = layer.weight
return W, bias
Only the layer.weight is saved, the bias is not set.
-
Pulled over from the Cats issue tracker, suggestion is by @hanetzer https://github.com/teamneoneko/Cats-Blender-Plugin-Unofficial-/issues/172
As title. Basic gist, if you use this option as is, you…
-
### Add Hardware Compatibility Check for FP8 Quantization
#### Issue Summary
In our current implementation, we provide three APIs for model computation in FP8 format. However, for dynamic activati…
-
This is a very good piece of work. Could you please explain why, after adding Ledit in the second stage of training, the weights of `adapter_modules` are fixed and only the weights of `id_encoder` are…
-
Training the model with `train_script.py`, I save for all the epochs only the weights using the following code:
```
# ModelCheckpoint callback to save model weights
checkpoint_callback = ModelChe…
-
Hi @danielhanchen ,
I am unable to use "unsloth/gemma-2b-bnb-4bit" via vLLM. I am getting below error while loading the model on Nvidia_T4 or NVIDIA_V100 GPU .
`engine_args = EngineArgs(model="u…
-
### Describe the bug
Trying to load local CKPT file using the "from_single_file()" method fails. Works fine with .safetensors file from same repo (Runway ML SD).
### Reproduction
```import to…
-
Can sparsity and quantization be used simultaneously to further improve inference speed? Do you have any plans in this regard? Looking forward to your reply @robertgshaw2-neuralmagic
-
I am trying to run quantization for int4 examples from `examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only` but there is a package missing in the requirement.tx…
-
I can successfully deploy llama3-8b-instruct with EAGLE. But there is a problem when deploying qwen2-7b-instruct with EAGLE.
I have converted the EAGLE-Qwen2-7B-Instruct model according to[vllm/mod…