-
### Your current environment
pip3 install vllm==0.4.2 nvidia-ammo==0.7.1
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: …
-
We currently only support Continuous value embeddings (a one to many FFN). We should try other things, like supporting quantizing.
-
### Question
I downloaded llava-llama-2-13b from:
https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview
Then I've quantized the model to 4-bit using .
```
git clone htt…
-
Hello,
I am trying to perform a QAT on a ResNet50 network with BN layers, and I keep getting the following error:
```
ValueError: Shape must be rank 4 but is rank 5 for '{{node batch_normalization_…
-
### Checklist
- [X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.…
ghost updated
1 month ago
-
Quantizing Stable Diffusion 3.5 models to any kind of k-quants results in large files made out of mostly fp16 weights. That's because
a lot of tensors have width 2432 or 7296, wich do not fit in the…
-
Hi all,
I was trying to quantize my model but something strange popped up.
I am using TensorFlow v2.14 and tfmot v0.7.5
I have a sub-classed tf.Keras.Model. It contains some custom layers and…
-
I'm trying compare the accuracies between resnet and its quantized version. First, I downloaded the resnet_v1 saved_model and used tensorflow's freeze_graph tool to freeze the graph.
I then followe…
-
### What happened?
Appended `--help` does not print help immediately, but starts quantization or throws error:
```shell
./llama-quantize model-bf16.gguf --help IQ4_NL
./llama-quantize model-bf16.g…
-
Is it possible to do the fine tuning quantizing the models and using qlora?