fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TransformerEngine #445

fp8 is slower than fp16

Hi experts, I tried to use Transformer Engine to do the inference test of llama2 on H800 and found that the speed of fp8 was much slower than fp16, bellow is a small reproduction that only contains `L…

WangQiangItachi updated 11 months ago
9
pytorch/FBGEMM #2713

FP8 Triton matmul code silently requires contiguous tensors

Hello! Thank you very much for this FP8 rowwise matmul code, it's been extremely helpful. However, there is a subtle bug/hidden requirement when eg. calling this code here: https://github.com/pytor…

rationalism updated 3 months ago
4
vllm-project/vllm #8244

[Performance]: Using vLLM for Llama3.1 405b fp8 on 8xH100 yi…

### Proposal to improve performance _No response_ ### Report of performance regression Following the blog post [announcement](https://blog.vllm.ai/2024/07/23/llama31.html), I tried to replica…

jorgeantonio21 updated 1 month ago
6
NVIDIA/TensorRT-LLM #1229

When will FP8 be available for Mixtral?

Could you guys share rough timeline on the support of FP8 quantization for Mixtral (MoE) model? cc: @Tracin

Pernekhan updated 5 months ago
11
microsoft/DeepSpeed #4852

[BUG] ZeRO++ is broken: `zero_quantized_weights` fails

**Describe the bug** Adding `"zero_quantized_weights": true,` leads to a crash: ``` 35:1]: warnings.warn( [35:1]:Traceback (most recent call last): [35:1]: File "/data/env/lib/repos/retro-l…

stas00 updated 2 weeks ago
7
bmaltais/kohya_ss #2717

flux training on a 2080ti failed

I tried flux training on a 2080ti with 22GB of VRAM, but I keep getting an error: ` Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Ex…

chenxluo updated 1 month ago
6
iree-org/iree #18932

[Attention] Remove scale as an input to AttentionOp/OnlineAt…

### Request description The scale parameter was added to the AttentionOp/OnlineAttentionOp as a stopgap solution to make models work. Now that we are in a better place to support attention, it's time…

Groverkss updated 2 weeks ago
2
HazyResearch/ThunderKittens #23

[Feature Request] GEMM benchmarks and FP8 Support

I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …

jwfromm updated 5 months ago
7
microsoft/onnxruntime #21090

[Feature Request] support FP8 calibraion method and quantiza…

Describe: I set weight/activation with QuantType.QFLOAT8E4M3FN when calling quantize_static, but I get the following errors: ```` Traceback (most recent call last): File "/home/developer/wor…

zccyman updated 4 months ago
2
meta-llama/llama-stack #164

fbgemm-gpu isn't officially supported on mac - optional depe…

Can we make `fbgemm-gpu` an optional dependency? https://pypi.org/project/fbgemm-gpu/#files It doesn't look like it's supported on a mac https://github.com/pytorch/FBGEMM/issues/1985. This means means…

vinooganesh updated 1 month ago
2

上一页 1...29 30 31 32 33 34 35...100 下一页

1000+ results for fp8

1000+ results
for fp8