-
I use Ollama as my inference server for local LLMs. Ollama is supported by many LLM frameworks, but not Guidance.
Would love to see a direct integration with Ollama via the models package.
I'm awa…
-
**Is your feature request related to a problem? Please describe.**
Segmentation viewing and export to PACS/VNAs outside of the integrated OHIF viewer and/or Slicer. In order to integrate inference pr…
-
Hi,
I am very interested in the distributed inference of Colossal AI. Since we have pre-trained NLP models from Pytorch or JAX, I wonder if possible or what should be done to use EnergonAI for infere…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I want to run inference of [ColPali](https://huggingface.co/vidore/colpali). I …
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-mmlab/mmdetection3d/issues) and [Discussions](https://github.com/open-mmlab/mmdetection3d/discussions) but cannot get the expec…
-
Hi,
I try to reproduce step 2 of the semantic search through wikipedia on my local computer with RTX 3090, and while importing data with the `nohup python3 -u import.py &` command I got the follow…
-
Do you support Exllamav2 backend for the inference that supports exl quants?
The current alternative is vllm but that doesn't support EXL quants. Also, after running a perplexity test, EXL is the b…
-
### Your current environment
```text
Collecting environment information...
/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprec…
-
**Description**
Trying to deploy Mistral-7B with Triton+TensorRT-LLM and running into this issue
**Triton Information**
Are you using the Triton container or did you build it yourself?
nvcr.i…
-
最后的日志显示:
qanything-container-local | Triton服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | The triton service is starting up, it can be long... you have time to make a coffee :)
qanyth…