int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

THUDM/ChatGLM-6B #119

[BUG/Help] 量化失败：RuntimeError: CUDA Error: no kernel image is…

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 模型量化失败。FP16精度下正常。量化后报错信息如下，芯片为P100 16G。请教下解决方法 RuntimeError: CUDA Error: no kernel ima…

rayle01 updated 6 months ago
23
OpenNMT/OpenNMT-py #2001

Improve CTranslate2 wrapping in translation_server

https://forum.opennmt.net/t/ctranslate2-on-opennmt-py-server/4175/8

francoishernandez updated 1 year ago
5
pytorch/ao #228

FloatQuantization subclass

As I was reviewing https://github.com/pytorch/ao/pull/223 I was reminded of this PR https://github.com/pytorch/ao/pull/214 And I'd be curious what range of floating point numbers we can just exp…

msaroufim updated 5 months ago
4
NVIDIA/TensorRT-LLM #1478

Failed to build engine with the 70B sq_int model

### System Info 4*A800 80G ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially supported tas…

Opdoop updated 3 days ago
2
vllm-project/vllm #8127

[Feature]: Support for quantization

### 🚀 The feature, motivation and pitch I propose implementing int8 quantization support for vLLM, focusing initially on the KV cache. This feature will allow users to run larger models or increase b…

CREVIOS updated 1 month ago
1
InternLM/xtuner #772

RuntimeError: expected mat1 and mat2 to have the same dtype,…

![image](https://github.com/InternLM/xtuner/assets/145842232/83f12831-573f-4a42-8f19-905e8a5d57e6) How do I solve this problem? The error is as above, and the config is attached below # Copyri…

Yanllan updated 2 months ago
2
fpgaminer/joycaption #3

How to run with BNB 4bit or 8bit quantization?

I tryed to modify your example code to run this model on lowvram card by BNB 4bit or 8bit quantization config. While use bnb 4bit config like below: ```python qnt_config = BitsAndBytesConfig(load…

fireicewolf updated 2 days ago
7
OpenMOSS/MOSS #297

Finetune提示out of memory

大神们好。我在4张A100上进行finetune，batch=1。但是还是会提示 `out of memory`。请问是啥情况啊

Tian14267 updated 1 year ago
11
vllm-project/vllm #3975

[RFC]: Int8 Activation Quantization

# Summary * We (engineering at @neuralmagic) are working on support for int8 quantized activations. * This RFC is proposing an _incremental_ approach to quantization, where the initial support for q…

tlrmchlsmth updated 1 month ago
3
microsoft/onnxruntime #11532

CPUExecutionProvider outputs wrong value for a quantized mod…

**Describe the bug** If you input `np.zeros((1, 120, 28, 28))` to [this model](http://shinh.skr.jp/t/quant_wrong.onnx), the output from CPU mismatches with the one from CUDA. I believe CUDA is righ…

shinh updated 9 months ago
3

上一页 1...91 92 93 94 95 96 97...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization