-
### System Info
- CPU archtecture: x86_64
- CPU/Host memory size: 250GB total
- GPU properties
- GPU name: 2x NVIDIA A100 80GB
- GPU memory size: 160GB total
- Libraries
- tensorrt @ fi…
-
模型是glm4-v-9b,显卡是3090和4090
启动命令:
xinference launch --model-engine Transformers --model-name glm-4v --size-in-billions 9 --model-format pytorch --quantization none
问题描述:
xinference刚刚升级到0.12.2版本后,3…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- …
-
Thanks for sharing the code. A good paper!
May I know how to run multi-bit quantization? Do you have the script or code for it?
And have you tried multi-bit quantization on image classification?
…
-
Thank you for the sample!
I'm not 100% sure, but I believe that the processor is missing input quantization as described at https://www.tensorflow.org/lite/performance/post_training_integer_quant#r…
-
**Describe the issue**:
I was doing a quantization tutorial (quantization_quick_start_mnist, quantization_speedup)
However, I runnig the tutorial on Jupyter notebook and an error occurred
Both tu…
-
1. Nomal float + Double quantization
QLoRA currently uses zero shot quantization which is different from GPTQ. However, unlike GPTQ, it does not require data, but incurs some performance loss. Theref…
-
`python scripts/txt2img.py --prompt "a photograph of a huge bear, style of TIME magazine" --plms
/home/grayson/miniconda3/envs/ldm/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning…
-
### Describe the issue
I do dynamic quantization for my model, and then tested it on Intel and amd cpus respectively. The inference speed can be greatly improved on the Intel CPU, but not on the amd …
-
### OpenVINO Version
2024.4.0
### Operating System
Ubuntu 20.04 (LTS)
### Device used for inference
CPU
### OpenVINO installation
Build from source
### Programming Language
…