-
### System Info
```shell
Optimum - latest from github
Python - 3.8
platform NVidia V100
```
### Who can help?
@JingyaHuang
### Information
- [X] The official example scripts
- […
-
### 🐛 Describe the bug
torchbench_amp_bf16_inference
- [ ] `sam_fast`
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.p…
-
### 🚀 The feature
## Author: Li Ning
## Background
A stateful model possesses the ability to detect interdependencies between successive inference requests. This type of model maintains a persist…
-
What I have experienced is that the inference of cpp on cpu is way too slow compared to the latest [diffusers](https://github.com/huggingface/diffusers). Especially, only the sampling in UNet takes ab…
-
Just a handy issue to be notified of latest changes and micro-releases (we will mostly changing the models)
-
![image](https://user-images.githubusercontent.com/49277976/63659032-ff893400-c7e9-11e9-9207-f330ac7db6e2.png)
The command that I use "python main.py --config-file configs/sgg_res101_joint.yaml --i…
-
i have run the command "python -m torch.distributed.launch --nproc_per_node=1 test_net.py ". In the config file, i used the download “model_finetune.pth” and test it on icdar_test dataset. The outpu…
-
hi, thank you for your open source. I have a few questions about the reasoning of quantitative models.
(1) if for the model with only W8A8 quantization, but kv cache does not quantize, whether the fo…
-
### 🚀 The feature, motivation and pitch
## Background
Many existing Large Language Models (LLMs) utilize FP16 during inference to improve performance. Downstream inference libraries, such as vllm, r…
-
After running below script
python3 deploy/python/infer.py
--model_dir=output_inference/picodet_lcnet_x1_0_layout/
--image_file=./docs/images/layout.jpg
--device=CPU
Error msg
batch_size: 1
…