-
Hi, I read the docs about `zero_quant`, but it seems to require extra training.
And in `deepspeed.init_inference`, the `dtype` can be set to int8, but the code does nothing for int8. https://github…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
**Describe the issue**
Please provide details relating to the issue you're hitting, if it is related to performance, accuracy or other model issues with bringing your own model to Qualcomm AI Hub, to…
-
### Feature request
Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…
-
I want to use INT8 matmul , and the code/output is as follows:
### Code
```
import bitblas
import torch
bitblas.set_log_level("Debug")
matmul_config = bitblas.MatmulConfig(
M=16, # M dime…
-
As is mentioned in this [issue](https://github.com/NVIDIA/TensorRT-LLM/issues/110) that the release branch does not support the bfloat16+weight_only_int8 quantization, while this feature is already su…
-
Hi, for model big as 7GB, does transformers support export to onnx?? Any tutorial about big model?
-
### Describe the issue
Hi IPEX team,
I have an application where I want to serve multiple models concurrently, and I want to share weights across concurrent instances. I normally do this with `tor…
-
please descript your problem in **English** if possible. it will to helpful to more people
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to repr…
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Win 10 22H2 (but reproducible elsewhere)
- TensorFlow installation (pip package or built from source): pip pack…
DLumi updated
8 months ago