-
# 错误信息:
(rkllm) python rkllm-toolkit/examples/test.py
INFO: rkllm-toolkit version: 1.1.2
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loa…
-
**Describe the bug**
Attempting to save PTQ `TorchVision` models using the `ptq_benchmark_torchvision.py` script after amending the script to save the model using `export_torch_qcdq` as a final ste…
-
Hi,
When we assign numbers in the Experiment column of the manifest, 1, 2, 3 ...10,11,12,...20,21,22...., the order that the samples appear in a quant table will be 1, 10, 11....2, 20, 21...etc. Is…
-
Hey, I am learning how to use Vitis AI 3.0 and trying to run the Quickstart tutorial for Vitis AI 3.0 `VCK190` resnet18.
At the Section of the "Pytorch turorial" :
`
Step 7 : Next, let’s run…
-
I can provide the execution provider like this:
config.StaticQuantConfig(calibration_data_reader=data_reader, quant_format=QuantFormat.QOperator, execution_provider="DmlExecutionProvider"), but there…
-
### 🐛 Describe the bug
python torchchat.py generate stories110M --quant torchchat/quant_config/cuda.json --prompt "It was a dark and stormy night, and"
Using device=cuda Tesla T4
Loading model...…
-
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
```python
from aw…
-
### Problem
Given a quant model (for example llama2-7B-nf4), vanilla inference is to dequant the model to fp16 or bf16 to compute, does exllamav2 support no-dequant inference?
-
devicec : nvidia NX
1.using trt --fp16
` /usr/src/tensorrt/bin/trtexec --onnx=best.onnx --workspace=4096 --saveEngine=best.engine --fp16`
the result of infer speed is 36.8ms
2. using pytorch_qua…
-
Hello! I am running DiaNN 1.8 on Windows and I noticed something odd, and I was wondering if there was an explanation for it. I basically have two runs.
All in One: Running all the samples at once …