-
## Bug Description
I get an error when converting a conformer transducer enecoder to tensorrt. (asr task)
## To Reproduce
[requirenments.txt](https://github.com/pytorch/TensorRT/files/123430…
-
### System Info
```shell
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.17.0-fw-51.1.0 |
| Driver Ver…
-
### Description
It is found currently that `DistilBert` using [`torch.Tensor`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/distilbert/modeling_distilbert.py#L246) ge…
-
**Description**
We found that the performance of triton+tensorrt under stable QPS and uneven QPS is very different. As follows:
- uneven QPS
(1) QPS
![image](https://github.com/triton-inference-se…
-
### System Info
```
(zt) root@autodl-container-7071118252-7032359d:~/test/PiPPy/examples/llama# transformers-cli env
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last p…
-
torch version: 2.5.0.dev20240616+cu121
python version: python 3.8
I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error
```
Loading checkpoint shards: 100%|███…
-
## 🐛 Bug
When I deploy my own 2B model using MLC on Android, the model interface initializes successfully and displays the "Ready to chat" prompt after opening. However, the app crashes after sendi…
-
Hi,
i encounter the following error message trying to enable flash attention when running the command below. Can i know is flash attention supported ?
``command: ./main -m $model -n 128 --prompt …
-
## Description
When I use your demo/Diffusion/demo_txt2img_xl.py for INT8 datatype inference, it reports an error:
Invoked with: %338 : Tensor = onnx::Constant(), scope: transformers.models.clip…
-
This is pretty weird.
The graph Attention plot graph is also blank
![alignment_009k](https://user-images.githubusercontent.com/2422433/40480115-ce19c76e-5f4d-11e8-8fa7-afe41d0a3bbf.png)
I r…