-
File dbscan_kmeans.py does not contain code for quantizing speech units. In fact it is empty.
benob updated
1 month ago
-
Looking for a way to quantize YOLO weights (to 8-bits or 16 bits). My idea is to speed up calculations as much as possible without hurting accuracy too much so I would like to experiment with that to…
-
Currently, TensorRT-LLM requires that LoRA weights dtype match the base model dtype. The check is here:
https://github.com/NVIDIA/TensorRT-LLM/blob/9dbc5b38baba399c5517685ecc5b66f57a177a4c/cpp/tensor…
-
### System Info
- GPU name: L40s
- CUDA: 12.1
```
Wed Jun 5 16:27:21 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 …
-
We currently only support Continuous value embeddings (a one to many FFN). We should try other things, like supporting quantizing.
-
I find that the quantisation losses are higher for GPTJ than LLama which seems to stay pretty low.
```
2023-06-20 19:05:19 INFO [auto_gptq.modeling._base] Quantizing attn.q_proj in layer 2/28...
…
ri938 updated
2 weeks ago
-
### Description
Hello,
I have noticed that the examples (e.g. 'detect_objects_file' and 'classify_images_file') do not quantize the input tensor, read from a .rgb file, before running inference. I…
-
Hi,
@byshiue
I wanna to quantize a llama model with long sequence 120K+, but an OOM Error raised. So I hope to solve OOM problem with multi gpus when quantizing model in convert_checkpoint.py.…
-
```
Some parameters are on the meta device device because they were offloaded to the cpu.
Quantizing weights: 0%| | 0/1771 [00:00
-
Hi @majianjia.
Thank you for your quick response everytime.
I have had accuracy test of my model using your framework.
It had got 99.2% using by caffe framework, but in nnom, it dropped to 95%.
Is…