-
## 🚀 Feature
At https://pytorch.slack.com/archives/C3PDTEV8E/p1638511540268500 we were discussing how depending on the model type the different bf16/amp or tf32 modes may or may not do much speed i…
-
We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47)
For this to run efficiently on the GPU we'd need kernel support for W4A8…
-
I am trying to verify the test results of the Raspberry Pi Zero W as listed in the table under the [Raspberry Pi section](https://github.com/google/XNNPACK#raspberry-pi) in the [README.md](https://git…
-
if ( bli_does_notrans( transa ) )
bli_obj_create( dt, m, k, rs_a, cs_a, &a );
else
bli_obj_create( dt, k, m, cs_a, rs_a, &a );
if ( bli_does_notrans( transb ) )
bli_obj_cre…
-
If we can speed up the BERT model, we will significantly increase the throughput of many cases. Experiment with SentenceTransformers first.
-
Hi,
I'm seeing higher losses using `te.Linear` over `nn.Linear` directly in transformer models such as Llama which I assume is expected due to the nature of FP8.
However, I don't see a loss inc…
-
For 2D inputs, `np.matmul` and `np.dot` are semantically the same, but I've found that in some cases `matmul` can be much slower even though the documentation for `np.dot` says `matmul` is preferred f…
-
in function "nv_wavenet_persistent_cur", we add values from "embedPrev" and "embedCur". Then we save the values in "xt_sh" and the same values in "xt". Then, before the GEMM, we update the values in "…
-
## 🐛 Bug
`torch.mm/addmm` are calling `cublasGemmEx` under the hood. However, they are type combinations that are claimed to be non-supported by pytorch when they should work fine:
Example:
…
-
### System Info
cuda12.2 torch2.1
### Who can help?
@byshiue
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [X] An officially supported task in th…