-
### Your current environment
[My Environment](https://github.com/vllm-project/vllm/files/14937936/env.txt)
OpenAI API launched using this command:
```
VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_NCC…
-
Hi Andre, I hope you’re doing well! I’d like to get your advice on something. If I decide to change my dataset, what adjustments would I need to make throughout the workflow? Are there specific things…
-
Hi, I am currently working with a custom tensorflow model. So far, the quantization was successful, but I would like to know the accuracy of quantization results.
The following command is for test…
-
**Describe the bug**
I had a full precision onnxruntime session. Then I loaded my network and quantized it by
**from onnxruntime.quantization import quantize, QuantizationMode
quantized_model = …
-
**Describe the bug**
It seems like any MinkowskiConvolution with stride > 1 produces non-deterministic features when executed on the GPU and no shared coordinate manager is used.
Running on the CP…
-
## 🐛 Bug
Hello, I am trying to quantize a model. I have done post training static quantization following the tutorial. During the conversion, I:
- define my model:
mymodel = model(cfg)
…
-
A usecase: storing a full backtracking pointer matrix can be okay for needleman/ctc alignment (4x memory saving compared to uint8 representation), if 2bit data type is used. Currently it's possible to…
-
I tried to run model Bert on Jetson, Ampere GPU for evaluating PTQ (post-training quantization) Int8 accuracy using SQuAD dataset , but it fails with the error below during building the engine:
WA…
-
Reduce neural network size by pruning and quantization for better performance
-
**Describe the bug**
The network I use is cascade_r101v1_fpn_1x.py, I then use Quantization during Training method, quantified cascade_r101v1_fpn_1x.py based on the quantization Settings of the faste…