-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Current Behavior
模型量化失败。FP16精度下正常。量化后报错信息如下,芯片为P100 16G。请教下解决方法
RuntimeError: CUDA Error: no kernel ima…
-
https://forum.opennmt.net/t/ctranslate2-on-opennmt-py-server/4175/8
-
As I was reviewing https://github.com/pytorch/ao/pull/223
I was reminded of this PR https://github.com/pytorch/ao/pull/214
And I'd be curious what range of floating point numbers we can just exp…
-
### System Info
4*A800 80G
### Who can help?
@Tracin
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [X] An officially supported tas…
-
### 🚀 The feature, motivation and pitch
I propose implementing int8 quantization support for vLLM, focusing initially on the KV cache. This feature will allow users to run larger models or increase b…
-
![image](https://github.com/InternLM/xtuner/assets/145842232/83f12831-573f-4a42-8f19-905e8a5d57e6)
How do I solve this problem? The error is as above, and the config is attached below
# Copyri…
-
I tryed to modify your example code to run this model on lowvram card by BNB 4bit or 8bit quantization config.
While use bnb 4bit config like below:
```python
qnt_config = BitsAndBytesConfig(load…
-
大神们好。我在4张A100上进行finetune,batch=1。但是还是会提示 `out of memory`。请问是啥情况啊
-
# Summary
* We (engineering at @neuralmagic) are working on support for int8 quantized activations.
* This RFC is proposing an _incremental_ approach to quantization, where the initial support for q…
-
**Describe the bug**
If you input `np.zeros((1, 120, 28, 28))` to [this model](http://shinh.skr.jp/t/quant_wrong.onnx), the output from CPU mismatches with the one from CUDA. I believe CUDA is righ…
shinh updated
9 months ago