-
I want to use INT8 matmul , and the code/output is as follows:
### Code
```
import bitblas
import torch
bitblas.set_log_level("Debug")
matmul_config = bitblas.MatmulConfig(
M=16, # M dime…
-
### Description & Motivation
Both the Fabric and Trainer strategies are designed to have a single plugin enabled from the beginning to the end of the program.
This has been fine historically, ho…
-
I've tried to use different releases of forge (cu121 and torch21; cu121 and torch231; cu124 and torch24) but i get error on loading flux1-dev-fp8 model
Also i tried to change GPU Weights or swap loca…
-
Hi,
Has anyone tried OpenMM in floating-point precision lower than FP32? Can one still run simulations in FP16 or FP8? Which operations could be ideally moved to lower precision?
Thanks!
-
### System Info
- GPU name: L40s
- CUDA: 12.1
```
Wed Jun 5 16:27:21 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 …
-
-
Please add option to adjust GPU Weight since my gpu only has 6GB Vram
my RTX 3060 laptop can run with normal fp8 within 100-150 sec,
but it talk super long with nf4 (my gpu run 99% all the time an…
-
I'm currently testing Llama2 70B on DGX-A100 and DGX-H100. I'm running the gptManagerBenchmark as described [here](https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.5.0/benchmarks/cpp) and compari…
jfolz updated
4 months ago
-
I'm having this error which i assume indicates that i don't have enough vram, however i'm able to run the FP8 version of the flux- dev and this exact same model on forge webui with no issues at all, s…
-
I can't use fp8 transformer on my 3090 Ti, 24 GB. Tried PyTorch nightly (2.5.0) and the latest release 2.4.0, same error every time:
![2024-08-28_12-14-24](https://github.com/user-attachments/assets/…