-
### 🚀 The feature
Right now mainly proprietary LLMs are supported. Would be great to also support DIY/OSS LLMs - for instance, hosted in [Databricks Model Serving](https://docs.databricks.com/en/mach…
-
### Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md)
### W…
-
cmake is not successful
```
❯ cmake --version
cmake version 3.21.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).
```
```
mkdir build
cd build
cmake -DCMAKE_INSTA…
-
Hello, I'm using 24.03-trtllm-python-py3 with image size 8.38 GB which is not small but ok.
I'm going to migrate to the newest versions like 24.04 or 24.05 but it size drastically increased to 18.46 …
Prots updated
2 months ago
-
Greetings, @cipher982!
I've seen the benchmark application https://www.llm-benchmarks.com/local and it looks great! I'm currently working on a competitive analysis of this 4 backends: Transformers…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
Hi FlexFlow team,
I used the methods mentioned in #1099 to test the latency(GPU: RTX-4090), but i get a confused result:
1)LLaMA-7B + 1个SSM(llama-160M), latency: 25.1 s
2)LLaMA-7B(without ssms), la…
-
### Your current environment
Using latest available docker image: vllm/vllm-openai:v0.5.0.post1
### 🐛 Describe the bug
I am getting as response "Internal Server Error" when calling the /v1/embedd…
-
- [ ] [LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4 - Predibase - Predibase](https://predibase.com/blog/lora-land-fine-tuned-open-source-llms-that-outperform-gpt-4)
# LoRA Land: Fine…
-
Hello. Thank for providing vLLM as a great open-source tool for inference and model serving! I was able to build vLLM on a cluster I maintain, but it only appears to work on a single MI210 GPU. Can so…