fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

sgl-project/sglang #1196

[Bug] enable-torch-compile error

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. - [x] 3. Please note that if the bug-related iss…

siddhatiwari updated 3 weeks ago
1
comfyanonymous/ComfyUI #4355

add support for InstantX/FLUX.1-dev-Controlnet-Union

### Feature Idea https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha ### Existing Solutions _No response_ ### Other _No response_

geroldmeisinger updated 2 weeks ago
8
huggingface/text-generation-inference #1819

Planned/Potential of significant work

- [ ] Fp8 kv-cache - [ ] Kv-cache prefix reuse - [ ] Grammar constrained speedup - [ ] `torch.compile` like speedups - [ ] Simple one-liner `pip install` - [ ] Multi lora support (lorax kind of) …

Narsil updated 3 weeks ago
4
vllm-project/vllm #3880

[Bug]: Prefix Caching fails with fp8 quantized KV Cache

### Your current environment ```text PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC ve…

amogkam updated 1 month ago
2
kohya-ss/sd-scripts #1595

FLUX training for 8GB VRAM?

I've tried the options for 12G, 16G, and 20G VRAM options here: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-lora-training and confirm they all work. But is it possible …

cocktailpeanut updated 10 hours ago
4
city96/ComfyUI-GGUF #68

Questions on GGU Q8 Model Performance: T5_FP16 vs T5_Q8 and …

Hello everyone, First off, a big thanks to city96 for the awesome work they've been contributing to the community. It's been incredibly helpful! Here are my system specs: Processor: Intel i5-13…

FerreiraArmando updated 3 weeks ago
2
HazyResearch/ThunderKittens #23

[Feature Request] GEMM benchmarks and FP8 Support

I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …

jwfromm updated 3 months ago
7
comfyanonymous/ComfyUI #4687

Flux Merged diffusion .safetensors files not readable

Hi, I successfully full finetuned flux with ostris ai toolkit and I got theses 3 files at the end of the training ( diffusion model files ) : diffusion_pytorch_model-00001-of-00003.safetensors dif…

BenDes21 updated 2 weeks ago
1
bmaltais/kohya_ss #2717

flux training on a 2080ti failed

I tried flux training on a 2080ti with 22GB of VRAM, but I keep getting an error: ` Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Ex…

chenxluo updated 1 month ago
1
triton-lang/triton #2781

h100 fp8 gemm with fp16-to-fp8 casting from the load make th…

Hi, When i tried the fp8 gemm code in matmul.py to cast the input "a" to be float16 but casted to fp8 just before the dot product op by setting AB_DTYPE to be tl.float8e4nv (link: https://github.com/…

stephen-youn updated 5 months ago
2

上一页 1...20 21 22 23 24 25 26...100 下一页

1000+ results for fp8

1000+ results
for fp8