-
## Software Quality
- [ ] Refactor Code #9
- [ ] Simple way to add new model
## Implementation
- [x] Support FlashAttention
- [x] Support Sampling
- [ ] Support Batch>1
- [ ] Lookahead wi…
-
**Bug Description**
Cost/Token Usage display does not work with AzureOpenai Provider
**To Reproduce**
Try to evaluate a RAG application using the AzureOpenai provider for any metric like grounded…
-
Darwin Feedloops-Mac-Studio-2.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:31:00 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6020 arm64
command: python -m llama_cpp.server --model ./…
-
I may be seeing problems where if the LLM fails to give a good response while we are using DEMO_MODE=true, we get stuck in a spot where we can not recover. This is not confirmed and is only a guess b…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
使用0.5.0版本部署InternVL-Chat-V1-5…
-
python -m fastchat.serve.openai_api_server --host localhost --port 8000
INFO: Started server process [17636]
INFO: Waiting for application startup.
INFO: Application startup complete.
…
-
### Python Version
```shell
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
```
### Pip Freeze
```shell
absl-py==2.1.0
annotated-types==0.7.0
anyio==4.0.0
argon2-cffi==23.1.…
-
我想在openai接口使用时,可以自由选择不同lora或者原版模型
-
### Is your feature request related to a problem? Please describe
A CLI chat application.
### Describe the solution you'd like
Wants to build a command line chat application in Python using `sock…
-
I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path
**DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning**
When r…