-
### Your current environment
Collecting environment information...
INFO 08-28 14:32:56 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 08-28 14:3…
-
when i use the example in multimodel, i download the original model-liuhaotian/llava-v1.5-7b,but some error occur?
llama = from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensor…
-
With the increasing popularity of LLMs, many companies have started to look into deploying LLMs.
Instead of `infer/predict`, `completions` and `embeddings` are being used. Most of the API supports…
-
As per title.
Example: with GPUs like 3060 12GB or 3090 24GB.
-
In line with the main philosophy of the Symbiont app, we want to use products that are open source and provide the option for self-hosting for maximum privacy and control.
-
## What
Let's support shape inference for operators whose inputs have dynamic shape.
I've made a list of operators to support dynamic-shaped LLM inference.
### First milestone (for token gen …
-
Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using `aiohttp`, and we should document this.
rlouf updated
5 months ago
-
### When the type of context in the incoming messages is text, an error occurs.
**API**: `/v1/chat/completions`
### request
```json
{
"max_tokens": 0,
"model": "qwen-72b-chat-int4"…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N…
-
**Describe the bug**
When I run `ilab data generate` there is no update or output like 0.17.1.
```
(venv-instructlab-3.11) ➜ instructlab ilab data generate
INFO 2024-08-08 16:00:04,437 numexpr.utils…