fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Dao-AILab/flash-attention #1008

Dose support kv cache is fp8 or int8 , but calculate is also…

Dose support kv cache is fp8 or int8 , but calculate is also fp16？read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.

KnightYao updated 3 weeks ago
1
NVIDIA/TensorRT-LLM #1849

Enough VRAM to run a model, but not enough to quantize

On my system, I have enough VRAM (72 GB) to run Llama-3-70B in 4-bit or 8-bit precision. However, I am unable to quantize this model to either 4-bit or 8-bit precision using the scripts in TensorRT-LL…

oobabooga updated 3 weeks ago
2
pytorch/pytorch #128370

torch.compile() throws RuntimeError on FP8 Triton matmuls

### 🐛 Describe the bug Running torch.compile() on this Triton FP8 matmul code: ```python def run_gemm() -> Tensor: x_fp8: Tensor w_fp8: Tensor x_scale: Tensor …

rosario-purple updated 1 month ago
6
pytorch/pytorch #128381

FP16/BF16 - FP8 mixed matmul causes core dump

### 🐛 Describe the bug ```python import torch import torch._inductor.config torch._inductor.config.force_mixed_mm = True def f(a, b): return torch.mm(a, b.to(a.dtype)) fp16_act = torc…

gau-nernst updated 1 month ago
4
mosaicml/llm-foundry #1261

How to continue pretrain LLM fp8 with hf_causal_lm

When I continue pretrain HF models with fp8, there is an error: TypeError: ComposerHFCausalLM.__init__() got an unexpected keyword argument 'fc_type'

YixinSong-e updated 1 month ago
1
microsoft/microxcaling #23

How is the matmul for MX format implemented?

Thanks for this great project! I have some question about how you implemented Matmul for two MX format matrices. This repo appears to provide its simulation, but do not provide its actual CUDA impl…

xijiu9 updated 3 weeks ago
1
NVIDIA/cutlass #1549

[QST] Hopper mixed precision gemm always worse than FP8

I'm doing A 4 bit x B fp16 matmul w/ large A and small B. I expect it to beat fp8 matmul (it should be memory-bound). In reality, it seems to be always worse. Example: Kernel code is here: https…

divchenko updated 3 weeks ago
7
pytorch-labs/applied-ai #21

Triton FP8 GEMM does not seem to work

Hello @AdnanHoque , I am trying to recreate the results from the blog [Accelerating Llama3 FP8 Inference with Triton Kernels](https://pytorch.org/blog/accelerating-llama3/). I haven't been able to get…

mgoin updated 2 months ago
9
PygmalionAI/aphrodite-engine #486

[Usage]: Higher Context Length.

I am using a gguf model on Aphrodite Engine but the issue is that i was to have context length of 8192 ctx but i can got it to load only about 4096 context length, issue is that i'm short on vram...…

Abulhanan updated 1 month ago
2
ROCm/AMDMIGraphX #1982

FP8 Support

Add ability to quantize to FP8. This will clearly need additional issues to be opened. Flags for the C++/Python API, Test cases, updates to our migraphx-driver, New kernels, a FP8 library , etc. …

causten updated 7 months ago
2

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for fp8

1000+ results
for fp8