-
DJL does not support (or has not documented support) for FP8 quantization ([docs](https://demodocs.djl.ai/docs/serving/serving/docs/lmi/user_guides/trt_llm_user_guide.html#quantization-support)).
…
-
- [ ] Fp8 kv-cache
- [ ] Kv-cache prefix reuse
- [ ] Grammar constrained speedup
- [ ] `torch.compile` like speedups
- [ ] Simple one-liner `pip install`
- [ ] Multi lora support (lorax kind of)
…
-
Prompt outputs failed validation
CheckpointLoaderSimple:
- Value not in list: ckpt_name: 'flux1-schnell-fp8.safetensors' not in []
Prompt outputs failed validation
CheckpointLoaderSimple:
- R…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
### System Info
- GPU 4 x A10G (EC2 g5.12xlarge) - memory 24GB
- TRTLLM v0.12.0
- torch 2.4.0
- cuda 12.5.1
- tensorrt 10.1
- triton 24.04
- modelopt 0.15
### Who can help?
_No response_
### Info…
-
Please see this commit that Comfy pushed earlier today that fixes the issue where some Flux LoRA are very weak when using along w/ fp8. It would be great if Forge were similarly updated so there is co…
CCpt5 updated
2 months ago
-
When trying to quantize a model the exepction is raised:
```
TorchRuntimeError: Failed running call_function (*(FakeTensor(..., device='cuda:0', size=(2, 32)), LinearActivationQuantizedTensor(Affine…
-
Here is my setup, using ubuntu
AMD 6800 XT 16GB Vram
32GB Ram
Python version: 3.10.12
pytorch version: 2.2.1+rocm5.7
I am getting between 14s-15s/it with flux1-dev-Q2_K.gguf, also Q4_0 and Q6_…
-
Why is that?
commit id: b57221b764bc579cbb2490154916a871f620e2c4
the convert command
```
python build.py --model_dir /data/weilong.yu/vicuna-13b-v1.5/ \
--quantized_fp8_mode…
-
Thank you so much for your contributions to the Text2Video open-source community! I used the same short prompt with the Mochi model through both the CLI demo and the playground, but I noticed a slight…