-
### Describe the issue:
`np.einsum` is ~20x slower than other libraries.
### Reproduce the code example:
```python
import numpy as np
x = np.random.uniform(size=(1000, 1, 500))
y = np.random.uni…
-
dpp generated `liblfdsd.d` file cause `gdc` link error: multiple definition of `_D8liblfdsd14threadConsumerFOCQBc__T9queue_ummTiZQnZv'; /tmp/ccmsmGbO.o:liblfdsd.d:(.text+0x8a00)
Both LDC and DMD wo…
mw66 updated
3 months ago
-
Following is the complete list of ops that appear in the torchbench + huggingface + TIMM models, but are not included in the nvFuser fusion group.
They are not included due to one (or more) of the …
-
Based on FW80.10.4 bundle baseline didt testing [results](https://docs.google.com/spreadsheets/d/10uWtBEkLLEM-h5TuuGQ6HW8AjhXwSjFV-cK3iVFYYIU/edit?gid=1118664107#gid=1118664107), this issue will be us…
-
### 🚀 The feature, motivation and pitch
### Motivation
[Cutlass](https://github.com/NVIDIA/cutlass) is an efficient template library for compute-heavy GPU operations like Gemm, Conv and others. It…
-
### 🐛 Describe the bug
I am not sure whether this issue should be a bug or a new feature reques. The problem is: when you register a hook to out_proj under MultiheadAttention, it will never be call…
-
### 🐛 Describe the bug
When I tried to use torch.optimizer.Adam with fused=True, I got the following error:
```text
File "/home/weixu/venvs/working/lib/python3.10/site-packages/torch/optim/adam…
-
## 🐛 Bug
In the function [p_choose](https://github.com/pytorch/fairseq/blob/f6abcc2a67328bee8b15c596bb626ce2d720aae6/examples/simultaneous_translation/modules/monotonic_multihead_attention.py#L152)…
-
Hello,
Amazing work! Did you ever try other PLM(Bert,Roberta...) as your backbone model? Or did they perform not well in your preliminary experiments? Thanks so much
-
I have successfully executed the shark project using the llama large language model, and it works well. The model was sourced from shark_tank in MLIR format. I would like to run another large language…