fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TransformerEngine #922

How to use FP8 of TransformerEngine in inference

ENV: RTX 8*4090 I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~

Godlovecui updated 2 weeks ago
2
google/jax #22270

Human readable `as_text` for `jax.stages.Compiled` / `HloMod…

`jax.stages.Lowered` is providing a simple stable HLO readable text output. e.g. ```python module @jit_matmul_fn attributes {mhlo.num_partitions = 1 : i32, mhlo.num_replicas = 1 : i32} { func.fun…

balancap updated 1 hour ago
2
NVIDIA/CUDALibrarySamples #187

cuBLASLt FP8 batched gemm with bias

Hi, when I try to implement cuBLASLt FP8 batched gemm with bias based on LtFp8Matmul, I met this problem. ``` [2024-05-22 07:06:23][cublasLt][62029][Error][cublasLtMatmulAlgoGetHeuristic] Failed t…

Sunny-bot1 updated 1 month ago
2
neuralmagic/AutoFP8 #2

Bitblas supports FP8 Inference as well

Hello @mgoin, it's a pleasant surprise to discover this project. Thank you for your contributions to BitBLAS. We have recently added support for FP8 Matmul, hoping it will help this project.

LeiWang1999 updated 1 month ago
3
NVIDIA/TensorRT-LLM #1463

[FP8 Post-Training Quantization] "use_fp8_context_fmha" Not …

### System Info CPU-X86 GPU-H100 Server XE9640 Code: TensorRT-LLM 0.8.0 release ### Who can help? @Tracin @juney-nvidia Regarding the [FP8 Post Quantization]((https://github.com/NVIDIA/Tenso…

taozhang9527 updated 2 months ago
3
NVIDIA/cutlass #1564

[QST]Why fp8 convert only has float2fp8 function without ptx…

**What is your question?** Why does the CUDA Toolkit only provide an implementation for double2fp8 in the conversion to FP8, while CUTLASS only provides float2fp8? For FP16 and FP32, the CUDA Toolk…

WtDMaO updated 1 week ago
2
InternLM/lmdeploy #1879

[Feature] long context inference optimization

### Motivation This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…

zhyncs updated 1 week ago
2
openmm/openmm #4553

OpenMM in FP16 or FP8 floating-point precision

Hi, Has anyone tried OpenMM in floating-point precision lower than FP32? Can one still run simulations in FP16 or FP8? Which operations could be ideally moved to lower precision? Thanks!

razvanmarinescu updated 2 weeks ago
9
NVIDIA/Megatron-LM #884

[QUESTION] Why is TELayerNormColumnParallelLinear used inste…

In the file megatron/core/models/gpt/gpt_layer_specs.py line 95, on the line "linear_fc1=TELayerNormColumnParallelLinear if use_te else ColumnParallelLinear" why is it TELayerNormColumnParallelLinear …

clarence-lee-sheng updated 5 days ago
2
chengzeyi/stable-fast #128

FP8 support in stable fast

Is it planned? Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`): ``` RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…

jkrauss82 updated 1 month ago
6

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for fp8

1000+ results
for fp8