-
NF4 model 1024 X 1024 resolution 10 Series 20 Series 8G graphics card, running a picture to take four minutes
-
### Your current environment
vllm 0.5.4
### 🐛 Describe the bug
autoawq marlin must with no zero point, but vllm:
```python
def query_marlin_supported_quant_types(has_zp: bool,
…
-
Hello everyone,
First off, a big thanks to city96 for the awesome work they've been contributing to the community. It's been incredibly helpful!
Here are my system specs:
Processor: Intel i5-13…
-
I have a few questions about the inference efficiency of deepseek v2
1.
> In order to efficiently deploy DeepSeek-V2 for service, we first convert its parameters into the precision of FP8.
Ar…
-
Loss in nan in the first batch of training itself when transformer architecture uses [rotary embedding](https://github.com/lucidrains/rotary-embedding-torch)
-
A | B | C | D | Compute(Scale)
-- | -- | -- | -- | --
fp32 | fp32 | fp32 | fp32 | fp32
fp16 | fp16 | fp16 | fp16 | fp32
fp16 | fp16 | fp16 | fp32 | fp32
bf16 | bf16 | bf16 | bf16 | fp32
fp8/bf…
-
https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/pull/2506/files
This PR had to disable FP8 tests for the CPU backend.
Ref implementation is doing Float -- > Fp8 -- > Float conversion but C…
-
### Feature Idea
Saw the claim on this reddit thread, hopefully the ideas there can also be brought into comfy for even more speedups.
https://www.reddit.com/r/StableDiffusion/comments/1ex64jj/i_m…
-
I'm wondering if there is a correct and less correct way to do this.
Should i add swiper before I add the database to a ondisk project will it filter everything then ?
Or do I still need to add filt…
-
### Is there an existing issue for this problem?
- [X] I have searched the existing issues
### Operating system
Windows
### GPU vendor
Nvidia (CUDA)
### GPU model
RTX 3060
### GPU VRAM
12GB
…