-
I am trying to use BioMedLM for generation, but I find that it is very slow at generation for long sequences. Training occurs at a normal speed. I wrote a minimal program (below) to reproduce this,…
-
Does vLLM support 8 bit quantization? We need to use vLLM with large context window (>1K tokens). We tried AWQ but the generation quality is not good. Any pointer will be greatly appreciated.
-
As the title says, I'm unable to install the latest version through `pip`
```
pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.6.3.tar.gz (2.6 MB)
P…
-
https://github.com/oobabooga/text-generation-webui/ can be used as a back-end to run dozens of different local models, including the latest LLaMA model. (LLaMA-13B beats GPT-3.5 in benchmarks while f…
-
### 🐛 Describe the bug
I am trying to use FSDP, but for some reason there is an error when I do model.generate(). MWE below
```
import torch
import os
from omegaconf import DictConfig
from tra…
-
## Summary
This ticket provides an update on the current research and development efforts related to creating a Unit Test Generator. The primary focus is on leveraging Large Language Models (LLMs), s…
-
> [!TIP]
> ## Want to get involved?
> We'd love it if you did! Please get in contact with the people assigned to this issue, or leave a comment. See general contributing advice [here](https://micros…
-
Can you help me to modified the source code to run with chatgpt-3.5 api. I don't want to use the gpt-4. Thanks
-
We ran the mt-bench multiple time with llama2-70b-chat.
With generated text(step 1 - ran once), GPT4 scoring(step 2 - ran multiple time) varies.
In our experiment it varied by 0.16 over 5 runs.
…
-
# URL
- https://arxiv.org/abs/2305.14282
# Affiliations
- Wenda Xu, N/A
- Danqing Wang, N/A
- Liangming Pan, N/A
- Zhenqiao Song, N/A
- Markus Freitag, N/A
- William Yang Wang, N/A
- Lei…