-
Where in the codebase might I find the basic arithmetic / steps for quantizing with NF4?
I’ve had trouble finding a clear definition of the math in existing tutorials, but based on what I see in th…
-
**Describe the bug**
I'm doing transfer learning and would like to (at the end) quantize my model. The problem is that when I try to use the _quantize_model()_ function (which is used successfully in…
-
I'm using the quantization script in examples/quantization and I'm running into an issue where I'm quantizing Mistral 7B to int4_awq and since Mistral 7B is bfloat16, I need to use bfloat16 dtype in t…
-
## Description
When performing Resnet18 PTQ using TRT-modelopt, I encountered the following issue when compiling the model with TRT.
First off, I started with a pretrained resnet18 from torchvi…
-
I followed your example `auto_test` with my own depthwise deparable CNN. After a few epochs of training my Keras model has an accuracy of 98.12% on the MNIST test set. After quantization the NNoM mode…
-
### Your current environment
pip3 install vllm==0.4.2 nvidia-ammo==0.7.1
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: …
-
Problem:
When running the get_feature_importance, it fails with the following error.
```
CatBoostError: /src/catboost/catboost/private/libs/algo/features_data_helpers.h:118: Internal CatBoost E…
-
Hey, i want to quantize my Qwen2 model but it seems then the files are not found even though it clones and installs llama.cpp correctly. When quantizing the mode i get this:
```txt
python3: can't …
-
Currently, we don't apply QLoRA to either the output projection or token embeddings. There's no great reason to not apply quantization to output projections, we simply don't do this due to limitations…
-
I tried to quantize a Llama model (Llama 13b) by smooth quant, and found that if I only quantize `LlamaDecoderLayer` then the accuracy would not drop even directly quantize weights and activations, bu…