-
### Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
### What would your feature do ?
In both UIs A1111 and Forge in **Opti…
-
### Your current environment
I am trying out FP8 support on AMD GPUs (MI250, MI300) and the vLLM library does not seem to support AMD GPUs yet for FP8 quantization. Is there any timeline for when thi…
-
Hi:
I'd like to test FP8 in RTX 4090. I can find some BF16 functions like SM80_16x8x8_F32BF16BF16F32_TN in cutlass/include/cute/arch/mma_sm80.hpp, however, I can't find some FP8 functions like SM80_1…
-
Just FYI, think this is failing because of a LoRA with only certain blocks trained:
```
File "flux-fp8-api/flux_pipeline.py", line 163, in load_lora
self.model = lora_loading.apply_lora_to_…
-
- CPU architecture: x86_64
- GPU: NVIDIA H100
- Libraries
- TensorRT-LLM: v0.11.0
- TensorRT: 10.1.0
- Modelopt: 0.13.1
- CUDA: 12.3
- NVIDIA driver version: 535.129.03
Hello, I'm e…
-
### System Info
GPU Name: 8 * H20
TensorRT-LLM : 0.11.0
NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.4
### Who can help?
_No response_
### Information
- [x] The official exam…
-
### 🚀 The feature, motivation and pitch
Hi, the code can run fine. It is just that the generated comments and names are a bit confusing.
Say we have a function with some torch ops at the beginning…
-
Thanks for open-sourcing FA3, good job! I am wondering about the FP8 feature.
**Compatibility**: Are the NVIDIA L40 and A100 GPUs compatible with the Flash Attention 3 FP8 feature?
**Performance…
-
When I'm trying to use `fp8_model_init` feature, it doesn't seem compatible with DDP. It throws an error:
`RuntimeError: Modules with uninitialized parameters can't be used with "DistributedDataParal…
-
It is recommended to add a category of information about quantified models, which are a significant part of the interest in flux
Emphasis on fp8, gguf q8, bnb-nf4 and more