fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

city96/ComfyUI-GGUF #64

[ info ] GUFF models can fix issues with some ( large ) lora…

I just want to share an anecdotal information, from a user perspective, in case you have issues with some loras. I had issues running this large lora ( 1.28 GB ), from civitai, with Flux FP8. http…

JorgeR81 updated 2 months ago
1
sipie800/ComfyUI-PuLID-Flux-Enhanced #9

comfyui manage管理器收录了您的项目，但是您的项目无法使用所有量化模型都会产生报错

使用原版没有问题https://github.com/balazik/ComfyUI-PuLID-Flux.git

kungful updated 2 weeks ago
1
NVIDIA/TensorRT-LLM #1906

Ada `FP8xint4` performance issue

Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation. Further more, according to my test,…

jcao-ai updated 2 months ago
6
pytorch/torchrec #2264

[Question] Is there FP8 embeddings support for training?

Hello, it looks like EmbeddingBagCollection forces data type to be float32 or float16 during initialization. https://github.com/pytorch/torchrec/blob/main/torchrec/modules/embedding_modules.py#L179 …

ShijieZZZZ updated 3 months ago
2
NVIDIA/TensorRT-LLM #2347

trtllm-bench "No module named 'tensorrt_llm.bench.datamodels…

### System Info CPU x86_64 GPU NVIDIA L20 TensorRT branch: v0.13.0 CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.5 ### Who can help? @kaiyux @byshiue ### Information…

activezhao updated 3 weeks ago
2
aredden/flux-fp8-api #10

Where is the code about "remaining layers use faster half pr…

`Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.` Hello there! Thanks for sharing you…

goldhuang updated 1 week ago
5
INT-FlashAttention2024/INT-FlashAttention #1

Unable to Reproduce Result in Paper

I would like to ask the exact speed benchmarking configuration in the paper, as they are not mentioned. I test the kernel on RTX 4090, pytorch 2.4 cu118 and the corresponding triton version. The resul…

jason-huang03 updated 1 month ago
3
vllm-project/vllm #7456

[Misc]: I want to run Llama 3.1 405B using speculative. Can …

### Anything you want to discuss about vllm. I am trying to perform a serving performance test using pipeline parallelism with the LLAMA 3.1 405B model as a draft model with 8b, but the model fails t…

Archmilio updated 1 week ago
2
NVIDIA/cutlass #1549

[QST] Hopper mixed precision gemm always worse than FP8

I'm doing A 4 bit x B fp16 matmul w/ large A and small B. I expect it to beat fp8 matmul (it should be memory-bound). In reality, it seems to be always worse. Example: Kernel code is here: https…

divchenko updated 1 month ago
8
neuralmagic/AutoFP8 #2

Bitblas supports FP8 Inference as well

Hello @mgoin, it's a pleasant surprise to discover this project. Thank you for your contributions to BitBLAS. We have recently added support for FP8 Matmul, hoping it will help this project.

LeiWang1999 updated 5 months ago
3

上一页 1...21 22 23 24 25 26 27...100 下一页

1000+ results for fp8

1000+ results
for fp8