fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pgvector/pgvector #521

Are either int8 or fp8 vectors planned?

Recently I've run a benchmark and recall test on the new halfvec and bit types, and they both yielded impressive results. All my tests were run against public data on https://meta.discourse.org/, a…

xfalcox updated 2 weeks ago
7
pytorch/pytorch #128370

torch.compile() throws RuntimeError on FP8 Triton matmuls

### 🐛 Describe the bug Running torch.compile() on this Triton FP8 matmul code: ```python def run_gemm() -> Tensor: x_fp8: Tensor w_fp8: Tensor x_scale: Tensor …

rosario-purple updated 3 weeks ago
6
pytorch/pytorch #128381

FP16/BF16 - FP8 mixed matmul causes core dump

### 🐛 Describe the bug ```python import torch import torch._inductor.config torch._inductor.config.force_mixed_mm = True def f(a, b): return torch.mm(a, b.to(a.dtype)) fp16_act = torc…

gau-nernst updated 3 weeks ago
4
mosaicml/llm-foundry #1261

How to continue pretrain LLM fp8 with hf_causal_lm

When I continue pretrain HF models with fp8, there is an error: TypeError: ComposerHFCausalLM.__init__() got an unexpected keyword argument 'fc_type'

YixinSong-e updated 3 weeks ago
1
nod-ai/SHARK #2054

FP8 Support

This is an umbrella issue for allowing fp8 type(s) in shark, spanning all the required layers of the stack: Turbine, IREE, MLIR, LLVM, including backends of interest like ROCm. Some initial researc…

kuhar updated 5 months ago
10
NVIDIA/TransformerEngine #606

FP8 Unable to achieve the expected FLOPS indicator in 4090

Hi experts, I tried to use Transformer Engine to detects flops that 4090 can achieve using fp8.I used te.Linear for my evaluation and got a maximum TFLOPS of only 150+。For fp16, the maximum is only 80…

kaijun924 updated 4 weeks ago
1
microsoft/microxcaling #23

How is the matmul for MX format implemented?

Thanks for this great project! I have some question about how you implemented Matmul for two MX format matrices. This repo appears to provide its simulation, but do not provide its actual CUDA impl…

xijiu9 updated 1 week ago
1
NVIDIA/cutlass #1549

[QST] Hopper mixed precision gemm always worse than FP8

I'm doing A 4 bit x B fp16 matmul w/ large A and small B. I expect it to beat fp8 matmul (it should be memory-bound). In reality, it seems to be always worse. Example: Kernel code is here: https…

divchenko updated 1 week ago
7
pytorch-labs/applied-ai #21

Triton FP8 GEMM does not seem to work

Hello @AdnanHoque , I am trying to recreate the results from the blog [Accelerating Llama3 FP8 Inference with Triton Kernels](https://pytorch.org/blog/accelerating-llama3/). I haven't been able to get…

mgoin updated 2 months ago
9
PygmalionAI/aphrodite-engine #486

[Usage]: Higher Context Length.

I am using a gguf model on Aphrodite Engine but the issue is that i was to have context length of 8192 ctx but i can got it to load only about 4096 context length, issue is that i'm short on vram...…

Abulhanan updated 1 month ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for fp8

1000+ results
for fp8