-
In short, we observed `mixed_bfloat16` in TPU is slower than `float32` in our model benchmarks. Please refer to this [sheet](https://docs.google.com/spreadsheets/d/1TPwbe8p6eD61arkoIXQnPHf3rgFIDFUZCot…
-
Hello!
This is the problem when I use `grounded_sam2_local_demo.py` for image inference
-
My simple inference script is failing when calling wrapper.merge_to() with Flux Dev as the base model.
```
2024-09-21 19:27:53|[LyCORIS]-INFO: Loading Modules from state dict...
2024-09-21 19:27:…
-
**Describe the bug**
I followed [02_pytorch_extension_grouped_gemm.ipynb](https://github.com/NVIDIA/cutlass/blob/main/examples/python/02_pytorch_extension_grouped_gemm.ipynb).
And I change dtype from…
-
### Environment
**Operating System:** Linux NixOS
**Version / Commit SHA:** fVDB
**Other:** gcc 10.5.0
### Describe the bug
I'm trying to build fVDB with CUDA 12.2, but the build fails with t…
-
got prompt
!!! Exception during processing !!! No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 577, 16, 64) (torch.bfloat16)
key : …
-
I'm using ROCm 5.7. Currently there are two datatypes for `bfloat16` -- `hip_bfloat16` and `__hip_bfloat16`. They seem to be defined respectively as
```
struct __hip_bfloat16 {
unsigned short d…
-
### System Info
2X L4 GPUs
Docker Image:
nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
### Who can help?
@juney-nvidia @kaiyux
### Information
- [ ] The official example sc…
-
### System Info
- OS: Ubuntu 20.04
- GPU: RTX 2080TI
### Who can help?
@byshiue @ncomly-nvidia
### Information
- [x] The official example scripts
- [ ] My own modified scripts
### Tasks
- [x] …
-
Hi I'm testing the local install & interface Dr. Furkan Gözükara made for Supir and its its working really well on a 4090 but i get the following error when i try to use it on an RTX8000.
RuntimeE…