fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

comfyanonymous/ComfyUI #4461

Flux Is very slow, and sometimes crashes in Comfy

### Your question I got: Total VRAM 8188 MB, total RAM 16011 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync …

Michlozz updated 3 weeks ago
3
NVIDIA/TransformerEngine #965

How to cast 16/32-bit to FP8?

Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.

mxjmtxrm updated 2 months ago
3
huggingface/optimum-habana #1321

Llama-2-70B FP8 quantization trust_remote_code=True not pass…

### System Info ```shell Optimum-habana v1.13.2 HL-SMI: hl-1.17.1-fw-51.5.0 Driver: 1.17.1-78932ae ``` ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks…

aalbersk updated 5 days ago
1
NVIDIA/cccl #525

Specialize relevant `cuda::(std::)` types for `__half/bfloat…

The CUDA extended floating point types [`__half`](https://docs.nvidia.com/cuda/cuda-math-api/struct____half.html#struct____half) and [`__nv_bfloat16`](https://docs.nvidia.com/cuda/cuda-math-api/struct…

jrhemstad updated 2 weeks ago
8
vllm-project/vllm #7714

[Bug]: Unable to use fp8 kv cache with chunked prefill on am…

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N…

w013nad updated 1 week ago
15
pytorch/pytorch #129803

3D matrix support in _scaled_mm

### 🚀 The feature, motivation and pitch 3D fp8 matrix multiplication can be useful for fp8 model with 3D matmul (it also can be used to improve accuracy of models with 2D fp8 quantized matrix multi…

rybakov updated 1 month ago
2
aredden/flux-fp8-api #10

Where is the code about "remaining layers use faster half pr…

`Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.` Hello there! Thanks for sharing you…

goldhuang updated 1 week ago
3
NVIDIA/TransformerEngine #1047

[PyTorch] Bug in FP8 buffer update causing training instabil…

Hello team, we have been debugging large scale training instabilities with FP8 and noticed that these started when updating from transfomer-engine v1.2.1 to v1.7. Taking a closer look at the traini…

Marks101 updated 1 month ago
1
vllm-project/vllm #8494

[Bug]: L40 GPU deepseek-v2 fp8 cuda graph error; Using `--…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch…

fengyang95 updated 34 minutes ago
3
neuralmagic/AutoFP8 #36

CUDA out of memory when quantizing llama3.1-405b on 80GiBx8 …

``` Some parameters are on the meta device device because they were offloaded to the cpu. Quantizing weights: 0%| | 0/1771 [00:00

sfc-gh-zhwang updated 1 month ago
2

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for fp8

1000+ results
for fp8