-
When i ran quantize code for llama3-70b-instruct. It was successfull, but when i used vllm load quantized model. I got a warning: `awq quantization is not fully optimized yet. The speed can be slower …
-
Feature to automate a variety of tasks associated with training a predictive machine learning model to generate market forecasts given a set of input signals. In general, this aims would be a sandbox …
-
### Motivation
Set internvl as an example, it's vision model is 6B. If the vision model can be quantilized, the inference process can be done in only one 4090.
请问目前vision model不支持量化的原因,是因为feature暂时还…
-
### Motivation.
At a high level, we at Neural Magic are writing a custom compiler for Torch Dynamo to define a system within vLLM where we can write graph transformations. The main goal is a separa…
-
Browser & OS (see also https://www.whatismybrowser.com/): N/A:Windows
## Describe the bug
I'm getting `HTTPError: 504 Server Error: Gateway Time-out for url: https://zenodo.org/api/files/38a5a2f8-…
-
### Problem
Currently, BlockWALService persists data blocks in parallel, responding directly to the upper layer with success as soon as any data block is persisted, even if the previous data block ha…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
### What happened?
https://github.com/AUTOMATIC1111/stable-diffusion…
-
Here is my setup, using ubuntu
AMD 6800 XT 16GB Vram
32GB Ram
Python version: 3.10.12
pytorch version: 2.2.1+rocm5.7
I am getting between 14s-15s/it with flux1-dev-Q2_K.gguf, also Q4_0 and Q6_…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
-
### Reproduction
-
### Expected behavior
Currently, the contamination-free packaging method is supp…
-
**Is your feature request related to a problem? Please describe.**
Last year I wrote [a long article on how to train controlnets](https://civitai.com/articles/2078) using diffusers, and trained [two]…