-
Hi, i got an anomaly while inference mistral with AWQ, below is the GPU usage on 3090 consume 20GB GPU. even if we inference the base model only consume 19GB GPU
here is the command: python -m vl…
-
### Your current environment
```text
tiktoken==0.6.0
transformers==4.38.1
tokenizers==0.15.2
vLLM Version: 0.4.3
fastchat Version: 0.2.36
```
### 🐛 Describe the bug
Currently, I'm using fa…
-
### OpenVINO Version
2024.3
### Operating System
Windows System
### Device used for inference
intel UHD Graphics GPU
### Framework
None
### Model used
meta-llama/Llama-3.2-3…
-
Using flux schnell and flux dev output the image in base64 while black-forest-labs/flux-1.1-pro outputs directly the link to the image on replicate server. Is it normal ?
Here is the used call for…
-
The stack tool cannot support large models with a .pth extension downloaded from Meta. It throws an error during runtime. Does it have to use models downloaded from Hugging Face? Is this setup unreaso…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a…
-
I used the official Docker image and downloaded the weight file from Meta. The md5sum test proved that the file was fine, but it still failed to run, which left me confused,I confirm that CUDA can be …
-
### 🚀 The feature, motivation and pitch
I use vllm.entrypoints.openai.api_user to start my large model, and the specific command is as follows:
```bash
python3 -m vllm.entrypoints.openai.api_server…
-
How to close this kind of error log, sometimes the network is not good and it will be interrupted frequently
![Dingtalk_20240717155321](https://github.com/user-attachments/assets/9028ef4e-98a6-4f39-a…
Rehtt updated
3 weeks ago
-
### Version
5.1.0
### Feature
This is a very loose feature idea, it's not urgent or anything.
It would be useful if when creating a new dataset in the Fuseki UI, the user would be presented with…