-
ENV: RTX 8*4090
I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code?
Thank you~
-
`jax.stages.Lowered` is providing a simple stable HLO readable text output. e.g.
```python
module @jit_matmul_fn attributes {mhlo.num_partitions = 1 : i32, mhlo.num_replicas = 1 : i32} {
func.fun…
-
Hi,
when I try to implement cuBLASLt FP8 batched gemm with bias based on LtFp8Matmul, I met this problem.
```
[2024-05-22 07:06:23][cublasLt][62029][Error][cublasLtMatmulAlgoGetHeuristic] Failed t…
-
Hello @mgoin, it's a pleasant surprise to discover this project. Thank you for your contributions to BitBLAS. We have recently added support for FP8 Matmul, hoping it will help this project.
-
### System Info
CPU-X86
GPU-H100
Server XE9640
Code: TensorRT-LLM 0.8.0 release
### Who can help?
@Tracin @juney-nvidia
Regarding the [FP8 Post Quantization]((https://github.com/NVIDIA/Tenso…
-
**What is your question?**
Why does the CUDA Toolkit only provide an implementation for double2fp8 in the conversion to FP8, while CUTLASS only provides float2fp8?
For FP16 and FP32, the CUDA Toolk…
-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…
-
Hi,
Has anyone tried OpenMM in floating-point precision lower than FP32? Can one still run simulations in FP16 or FP8? Which operations could be ideally moved to lower precision?
Thanks!
-
In the file megatron/core/models/gpt/gpt_layer_specs.py line 95, on the line "linear_fc1=TELayerNormColumnParallelLinear if use_te else ColumnParallelLinear" why is it TELayerNormColumnParallelLinear …
-
Is it planned?
Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`):
```
RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…