-
### Your current environment
Using:
* vllm 0.4.1
* nccl 2.18.1
* pytorch 2.2.1
### 🐛 Describe the bug
During inference I sometimes get this error:
```bash
(RayWorkerWrapper pid=2376582…
-
rgrg/src$ python ./full_model/generate_reports_for_images.py
Traceback (most recent call last):
File "/media/Win11/rgrg/src/./full_model/generate_reports_for_images.py", line 20, in
from src…
-
**Describe the bug**
I'm compressing a qwen2.5_7b model using `examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py`, but I failed to load the stage_sparsity model. The error is shown belo…
-
xpu and cpu Intel images referenced in documentation do not exist:
* https://huggingface.co/docs/text-generation-inference/en/installation_intel
* https://github.com/huggingface/text-generation-infe…
-
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
## 🐛 Bug
When I run the demo.py , the error is :
```
Tracebac…
-
Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?
```python
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(
load_in_4bit=T…
-
**Build Scans:**
- [elasticsearch-periodic #4727 / openjdk17_checkpart2_java-fips-matrix](https://gradle-enterprise.elastic.co/s/suslzsefkzbd4)
- [elasticsearch-periodic #4712 / openjdk17_checkpart2_j…
-
**Description**
Tooltips should be present for users if they are present on element. For users using only keyboard as well (not only for the users using mouse).
**Preconditions**
Stateful Indices -> …
-
### Describe the issue
I have a detector with FP16 and FP32 weights(onnx).
Below is the code for FP32 which gives the correct detections when inferencing on FP32 weights.
```
void process_image…
-
Hi authors,
I want to test the performance of the Mistral7B on the test dataset. Is it only possible to do single sample inference (with model. generate(...))? Are there any methods to accelerate t…