-
I am trying to speed up benchmarking on A100. Below are times of tests on one task in two versions using Mistral.
![image](https://github.com/EleutherAI/lm-evaluation-harness/assets/1849959/f012818…
-
**Describe the bug**
rsLoRA微调yi-6B-chat,Swift的web ui和命令行infer都正常,但fastchat后端启动后推理乱码
![image](https://github.com/modelscope/swift/assets/28507966/ed565565-c6b7-4488-a491-ef5c19048d5b)
对于LoRA微调yi-6B-…
-
### The model to consider.
https://huggingface.co/facebook/chameleon
(as of now, the models can be downloaded using the [model form](https://ai.meta.com/resources/models-and-libraries/chameleon-do…
-
IMAGE SYNC
eijix updated
3 weeks ago
-
### 🚀 The feature, motivation and pitch
As the title suggests
Currently, VLLM supports MOE, but does not support quantitative versions. During use, the quantitative version will provide better cost-…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Discussed in https://github.com/outlines-dev/outlines/discussions/683
Originally posted by **lapp0** January 23, 2024
### What behavior of the library made you think about the improvement?…
-
I would like to run embedding as a service using something like vLLM on a Docker container on different host. How would one go about doing this?
-
Hi,
I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a single gpu.
I also tried different methods to…
-
On my RX 6800 I seem to get `RuntimeError: FlashAttention only supports AMD MI200 GPUs or newer.` for some reason, I Googled that GPU and it seems to be RDNA2 like mine but for enterprise. Is this not…