-
### System Info
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang v…
-
## 🐛 Bug
I'm following [Pytorch Vulkan backend user workflow](https://pytorch.org/tutorials/prototype/vulkan_workflow.html#android-java-api) in order to build a libtorch binary that includes Vulkan…
ghost updated
2 years ago
-
I found the crash dump in the log. It seem crash occurs when it frees the matrix in the function cblas_sgemm_fixed.
I set libc.debug.malloc to 10 and it reported rear guard mismatch for 20bytes.
> …
-
Hi, I wanna ask have any benchmark about Adreno gpu? Or does clblas tunned on Adreno GPU?
I only found an issue about tuning but no benchmark information found.
Thanks in advance.
-
We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47)
For this to run efficiently on the GPU we'd need kernel support for W4A8…
-
## 🚀 Feature
At https://pytorch.slack.com/archives/C3PDTEV8E/p1638511540268500 we were discussing how depending on the model type the different bf16/amp or tf32 modes may or may not do much speed i…
-
I try to run llama-7b with TensorRT-LLM, when build TensorRT engine as follows:
python3 build.py --model_dir /opt/llms/llama-7b
--dtype float16
…
-
if ( bli_does_notrans( transa ) )
bli_obj_create( dt, m, k, rs_a, cs_a, &a );
else
bli_obj_create( dt, k, m, cs_a, rs_a, &a );
if ( bli_does_notrans( transb ) )
bli_obj_cre…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.…
-
## 🐛 Bug
`torch.mm/addmm` are calling `cublasGemmEx` under the hood. However, they are type combinations that are claimed to be non-supported by pytorch when they should work fine:
Example:
…