-
### 🚀 The feature, motivation and pitch
Doing matmuls in FP8 is essential for getting the best performance out of the newest hardware (e.g. H100s and beyond). Currently, the only option for doing m…
-
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding mo…
-
I have a RTX 2060 Super (8GB VRAM) and 64GB RAM. Using this NF4 checkpoint loader with either the Shnell of Dev NF4 variants, the image generation time increases 20-fold rather than decreases. Might m…
-
### Problem Description
### Parsing OCP FP8 Model
This would require MIGraphX to expose E4M3FN data type into the IR. Currently only E4M3FNUZ type is exposed. It is probably not a big work to expo…
-
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
File "C:\Product\ComfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 153, in recursive_execute…
-
File "/home/user/app/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 169, in process
clip.tokenizer.t5xxl.pad_to_max_length = True
AttributeError: 'SD1Tokenizer' object has no attribute …
-
Chunked context is a great feature for GPU memory capacity bound scenario, by enable this feature, I can increase batch_size without worrying about (batch * avg_prefill_seq) may cause oom.
However…
-
### Expected Behavior
The checkpoint "flux1-dev-fp8.safetensors" should load successfully using the CheckpointLoaderSimple node.
![ComfyUI](https://github.com/user-attachments/assets/329f0529-5052…
txhno updated
1 month ago
-
Any plan to develop DMD2 for flux?
The original
https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main
fp8 version
https://huggingface.co/Comfy-Org/flux1-dev/tree/main
-
### System Info
CPU architecture: x86_64
Host RAM: 1TB
GPU: 8xH100 SXM
Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend
TensorRT-LLM version: 0.10.0.dev2024043000
Dr…