bentoml BentoVLLM issues

bentoml / BentoVLLM

Self-host LLMs with vLLM and BentoML

72 stars 12 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

feat: add llama-3.1-8b function calling example

#92 larme closed 5 days ago
0
fix: accurate num_gpus for jamba-1.5-mini

#91 larme closed 1 week ago
0
feat: add jamba-1.5-mini fp16 bento

#90 larme closed 1 week ago
0
Are the codes of bentovllm_openai the same for different models? Why not publish them to pip?

#89 frei-x opened 1 week ago
0
vLLM already supports openai, why use bentovllm_openai?

#88 frei-x opened 1 week ago
0
Feat/llama3.1 function calling

#87 larme closed 1 week ago
0
docs: Update readmes

#86 Sherlock113 closed 1 week ago
0
feat: add ai21 jamba models

#85 larme closed 1 week ago
0
docs: Add secret creation step

#84 Sherlock113 closed 2 weeks ago
0
docs: Add quotation marks to python versions

#83 Sherlock113 closed 1 month ago
0
fix(llama3.2): bento name

#82 aarnphm closed 1 month ago
0
fix(llama3.2): concurrency to match max_num_seqs

#81 aarnphm closed 1 month ago
0
docs(llama3.2): update model name

#80 Sherlock113 closed 1 month ago
0
docs(llama3.2): update model name

#79 Sherlock113 closed 1 month ago
0
feat: llama 3.2 vision

#78 aarnphm closed 1 month ago
0
feat: add mistral nemo 2407

#77 larme closed 1 month ago
0
chore: update vllm version to 0.6.1.post2

#76 larme closed 1 month ago
0
chore: update vllm version to 0.6.1.post2

#75 larme closed 1 month ago
0
feat: add mistral-7b v0.3 support tools=auto

#74 larme closed 1 month ago
0
chore(deps): bump the pip group across 2 directories with 1 update

#73 dependabot[bot] opened 1 month ago
0
improv: polish pixtral example

#72 larme closed 1 month ago
0
feat: add pixtral support

#71 larme closed 1 month ago
0
feat: add GGUF model format example

#70 larme closed 2 months ago
0
fix: allow building bento on mac again

#69 larme closed 2 months ago
0
feat: add num_scheduler_steps arg to AsyncEngine

#68 larme opened 2 months ago
0
chore: update to vllm 0.6.0

#67 larme closed 2 months ago
0
fix: use awq_marlin intead of awq

#66 larme closed 2 months ago
0
docs: Specify Python versions

#65 Sherlock113 closed 2 months ago
0
fix: typo

#64 bojiang closed 2 months ago
0
chore: fix service name typos

#63 ssheng closed 2 months ago
0
chore: lower gpu memory utilization

#62 larme closed 1 month ago
0
feat: add phi3-mini

#61 larme closed 2 months ago
0
chore: upgrade to vllm==0.5.3.post1

#60 larme closed 3 months ago
0
feat: Support llama-3.1 405B awq

#59 rickzx closed 3 months ago
0
Add support for llama 3.1

#58 rickzx closed 3 months ago
0
awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.

#57 jackNhat closed 1 month ago
0
fix: allow building bento on Mac/Windows

#56 larme closed 3 months ago
0
chore: add gemma-7b-it

#55 larme closed 4 months ago
0
chore: relax outlines integration's vllm version

#54 larme closed 4 months ago
0
chore: upgrade to vllm 0.5.0

#53 larme closed 4 months ago
0
Fix: llama3 bento deployment configuration

#52 larme closed 5 months ago
0
Chore: vllm upgrade 043

#51 larme closed 5 months ago
0
fix: down grade vllm version for outline integration

#50 larme closed 5 months ago
0
feat: update openai decorator for every model

#49 larme closed 5 months ago
0
fix: minor openai decoratorfix

#48 larme closed 5 months ago
0
feat: openai endpoints decorator with default parameters

#47 larme closed 5 months ago
0
chore: set service API concurrency to match vLLM default max_num_seqs

#46 rickzx closed 5 months ago
0
chore: serving llama3-8b without importing model first

#45 larme closed 5 months ago
0
fix: openai endpoints for vllm 0.4.2

#44 larme closed 5 months ago
0
Fix compatibility issue with vllm 0.4.2. Enable prefix caching

#43 rickzx closed 5 months ago
0