-
## Environment
- OS: Databricks runtime 15.3ML with mosaicml streaming 0.8.1.
- Hardware (GPU, or instance type): g4dn.12xlarge
## To reproduce
Steps to reproduce the behavior:
```…
-
https://huggingface.co/smallcloudai/Refact-1_6B-fim - via https://news.ycombinator.com/item?id=37381862
-
We want to design public API so Sudachi would be usable like the following.
Syntax can be a bit invalid and all names are open for discussion.
```rust
let model = JapaneseModel::from_cfg("...")?;…
-
let's face it. these models are developed with static datasets but a primary use case is streaming audio transcription.
Please include a microphone-based demo (or suffer 1000 github issues begging …
-
再请教一个问题,训练好的sat模型如何再两张GPU上加载并推理? 我目前只有A100(40G)的版本,有时推理会报内u才能溢出。
请问如何设置多GPU推理?
下面模型加载部分如何设置呢?谢谢
`# load model
model, model_args = AutoModel.from_pretrained(
args.from_pretra…
-
Using the API server and submitting multiple prompts to take advantage of speed benefit returns the following error:
"multiple prompts in a batch is not currently supported"
What's the point of …
jpeig updated
2 months ago
-
系统环境:
(Orion) PS D:\Huggin face\Orion-14B-App-Demo-CN\demo> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Univer…
-
Hi,
this seems to affect `AzureOpenAiStreamingChatModel`. I am hypothesising that `LocalAiStreamingChatModel` since the code that handle response code seems to be similar to `AzureOpenAiStreaming…
-
### Your current environment
Docker latest 0.5.4
```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=10.…
-
### Feature request
I have download the model, so I want to run it use local model, eht sample is:
docker run --gpus all --shm-size 1g -p 8080:80 -v /data/model/:/data/ \
ghcr.io/predibase/lora…