-
For milestone in https://github.com/Samsung/ONE/projects/9#card-79474017
## Candidate 1
[ one-cmds pythorch (or ONNX) LSTM op import fails · Issue #8217](https://github.com/Samsung/ONE/issues/82…
-
I'm testing the quantization training and inference of torchrec and I found the quantized model sometimes has the wrong output with certain world sizes. I compare the outputs of a distributed model an…
-
I try to run llava-v1.6-34b-hf-awq and sucessed, but how can I run the test for Llava-v1.5 ConditionalGeneration?
https://github.com/casper-hansen/AutoAWQ/pull/250
The bug of example likely :
1. ma…
-
**Describe the bug**
Adding `"zero_quantized_weights": true,` leads to a crash:
```
35:1]: warnings.warn(
[35:1]:Traceback (most recent call last):
[35:1]: File "/data/env/lib/repos/retro-l…
-
Hi!
When trying to quantize the new Deepseel Coder V2 https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct I got the following error:
```
!! Warning, unknown architecture: DeepseekV2F…
-
Do you only use chat formatting from fastchat? Or also inference? Fastchat already supports GPTQ.
https://github.com/lm-sys/FastChat/blob/main/docs/gptq.md
My other idea was to edit loading parame…
-
根据 [airockchip](https://github.com/airockchip)/[yolov5](https://github.com/airockchip/yolov5) 使用
```python
python export.py --rknpu rk3399pro
```
命令成功导出了 yolov5s.torchscript , 然后在rknn-toolkit-1…
-
It's useful to be able to run a quantized transformer exported TorchScript model on CUDA even if some quantized operators are performed by dequant/conversion to float32/requant (some sort of temporary…
-
We have an existing model [neuralmagic/SparseLLama-2-7b-ultrachat_200k-pruned_50.2of4](https://huggingface.co/neuralmagic/SparseLLama-2-7b-ultrachat_200k-pruned_50.2of4) that has already been pruned t…
-
**Describe the bug**
Loading the llama2 70b model using 4 bit(bitstandbytes) and then distributed the model by calling deepspeed.initialize. Get the following error
```
------------------------…