-
### System Info
* docker image: ghcr.io/huggingface/text-generation-inference:2.0.2
* docker image: ghcr.io/huggingface/text-generation-inference:2.1.1
### Information
- [X] Docker
- [ ] The CLI…
-
![image](https://github.com/user-attachments/assets/a5c487e9-97a5-4367-8980-32e2ce129a38)
windows平台在docker中运行webui.py
```
Traceback (most recent call last):
File "/usr/local/lib/python3.8/di…
-
### Description
The cohere rerank implementation allows configuring fields that probably don't apply. The implementation leverages the common settings here: https://github.com/elastic/elasticsearch/b…
-
![image](https://github.com/user-attachments/assets/9686e584-0af5-447a-88d1-b27bff5262e8)
---------------------------------------------------------------------------
ModelError …
-
when i use api method to inference ,I find error when i use multi-GPU,
I also find that api.py is import run_old.py, it seems that it not can be use multi-GPU?
-
I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized.
Based on my understanding, I need to prepack the weights to redu…
-
Got error "Error Building Component
Error building vertex Hugging Face API: Failed to resolve model_id:Could not find model id for inference server: https://api-inference.huggingface.co/models/mi…
-
There seems to be an issue running the model on huggingface (https://huggingface.co/Norm/nougat-latex-base), as responses seem to be cut short. Take for example this image:
![Screenshot 2023-11-29 …
-
It seems that the move_to_gpu & move_to_cpu is not working as expected in the branch fast_inference.
https://github.com/RVC-Boss/GPT-SoVITS/blob/fast_inference_/api_v3.py#L327-L343
It will alway…
-
Hi there,
I believe I almost have this all figured out and it's working great. One issue I'm having is that after infering one time using the API, memory usage stays very high (12.7gb out of 16), e…