fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

deepjavalibrary/djl #3145

Support for FP8 quantization with TensorRT-LLM

DJL does not support (or has not documented support) for FP8 quantization ([docs](https://demodocs.djl.ai/docs/serving/serving/docs/lmi/user_guides/trt_llm_user_guide.html#quantization-support)). …

nathan-az updated 6 months ago
2
huggingface/text-generation-inference #1819

Planned/Potential of significant work

- [ ] Fp8 kv-cache - [ ] Kv-cache prefix reuse - [ ] Grammar constrained speedup - [ ] `torch.compile` like speedups - [ ] Simple one-liner `pip install` - [ ] Multi lora support (lorax kind of) …

Narsil updated 2 months ago
4
black-forest-labs/flux #96

As a computer novice, how can I solve this problem?

Prompt outputs failed validation CheckpointLoaderSimple: - Value not in list: ckpt_name: 'flux1-schnell-fp8.safetensors' not in [] Prompt outputs failed validation CheckpointLoaderSimple: - R…

ccs779 updated 2 months ago
1
vllm-project/vllm #7579

[Bug]: fp8 performance is worse than fp16 when batch size is…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTor…

kuangdao updated 2 months ago
10
NVIDIA/TensorRT-LLM #2398

T5 out of memory

### System Info - GPU 4 x A10G (EC2 g5.12xlarge) - memory 24GB - TRTLLM v0.12.0 - torch 2.4.0 - cuda 12.5.1 - tensorrt 10.1 - triton 24.04 - modelopt 0.15 ### Who can help? _No response_ ### Info…

ydm-amazon updated 2 days ago
10
lllyasviel/stable-diffusion-webui-forge #1252

Fix loras having a weak effect when applied on fp8. - [Comfy…

Please see this commit that Comfy pushed earlier today that fixes the issue where some Flux LoRA are very weak when using along w/ fp8. It would be great if Forge were similarly updated so there is co…

CCpt5 updated 2 months ago
7
pytorch/ao #1268

fp8dq requires both dimensions to be divisible by 16

When trying to quantize a model the exepction is raised: ``` TorchRuntimeError: Failed running call_function (*(FakeTensor(..., device='cuda:0', size=(2, 32)), LinearActivationQuantizedTensor(Affine…

piotr-bazan-nv updated 1 day ago
1
city96/ComfyUI-GGUF #48

Low speed on AMD GPU

Here is my setup, using ubuntu AMD 6800 XT 16GB Vram 32GB Ram Python version: 3.10.12 pytorch version: 2.2.1+rocm5.7 I am getting between 14s-15s/it with flux1-dev-Q2_K.gguf, also Q4_0 and Q6_…

ArkhamInsanity updated 1 month ago
37
NVIDIA/TensorRT-LLM #959

[BUG] fp8 run failed on L40s

Why is that? commit id: b57221b764bc579cbb2490154916a871f620e2c4 the convert command ``` python build.py --model_dir /data/weilong.yu/vicuna-13b-v1.5/ \ --quantized_fp8_mode…

sleepwalker2017 updated 9 months ago
1
genmoai/models #65

About implicit prompt refiner or negative prompt in playgrou…

Thank you so much for your contributions to the Text2Video open-source community! I used the same short prompt with the Mochi model through both the CLI demo and the playground, but I noticed a slight…

DZY-irene updated 1 week ago
2

上一页 1...31 32 33 34 35 36 37...100 下一页

1000+ results for fp8

1000+ results
for fp8