-
[instagirl_config.json](https://github.com/user-attachments/files/16643076/instagirl_config.json)
### What happened?
I've been experiencing the same error for the last few months every time I tr…
-
### What is the issue?
Using `ollama:latest` with nvidia-docker and 2x4090.
Tried blasting a bunch of 256 words long text snippets to ollama for embeddings generation using `all-minilm:l6-v2`.
…
-
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below…
-
How-to question: I've been testing and learning with the "adanet_objective" sample. How do you use the recommended model to run predictions samples and eventually serve for live data feed?
-
### 问题确认 Search before asking
- [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.
### 请提出你的问题 Please ask your question
问题:按照文档 deploy/serving/python/README.md 部署服务后,测…
-
#### Description
LMCache retrives tensors by chunks then concatenates them and it stores by stacking tensors in a chunk. Both introduce unnecessary copies in memory.
It would be better if LMCache …
-
-
Is it working right now in any way?
-
### 🚀 The feature, motivation and pitch
Paper from Microsoft on serving loras in production
https://arxiv.org/abs/2404.05086v1
### Alternatives
_No response_
### Additional context
_No respons…
-
I am trying to use Llama-2-70b-chat-hf as zero-shot text classifier for my datasets. Here is my setups.
1. vLLM + Llama-2-70b-chat-hf
I used vLLM as my inference engine as run it with:
```
pyt…