issues
search
bentoml
/
BentoVLLM
Self-host LLMs with vLLM and BentoML
72
stars
12
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
feat: add llama-3.1-8b function calling example
#92
larme
closed
5 days ago
0
fix: accurate num_gpus for jamba-1.5-mini
#91
larme
closed
1 week ago
0
feat: add jamba-1.5-mini fp16 bento
#90
larme
closed
1 week ago
0
Are the codes of bentovllm_openai the same for different models? Why not publish them to pip?
#89
frei-x
opened
1 week ago
0
vLLM already supports openai, why use bentovllm_openai?
#88
frei-x
opened
1 week ago
0
Feat/llama3.1 function calling
#87
larme
closed
1 week ago
0
docs: Update readmes
#86
Sherlock113
closed
1 week ago
0
feat: add ai21 jamba models
#85
larme
closed
1 week ago
0
docs: Add secret creation step
#84
Sherlock113
closed
2 weeks ago
0
docs: Add quotation marks to python versions
#83
Sherlock113
closed
1 month ago
0
fix(llama3.2): bento name
#82
aarnphm
closed
1 month ago
0
fix(llama3.2): concurrency to match max_num_seqs
#81
aarnphm
closed
1 month ago
0
docs(llama3.2): update model name
#80
Sherlock113
closed
1 month ago
0
docs(llama3.2): update model name
#79
Sherlock113
closed
1 month ago
0
feat: llama 3.2 vision
#78
aarnphm
closed
1 month ago
0
feat: add mistral nemo 2407
#77
larme
closed
1 month ago
0
chore: update vllm version to 0.6.1.post2
#76
larme
closed
1 month ago
0
chore: update vllm version to 0.6.1.post2
#75
larme
closed
1 month ago
0
feat: add mistral-7b v0.3 support tools=auto
#74
larme
closed
1 month ago
0
chore(deps): bump the pip group across 2 directories with 1 update
#73
dependabot[bot]
opened
1 month ago
0
improv: polish pixtral example
#72
larme
closed
1 month ago
0
feat: add pixtral support
#71
larme
closed
1 month ago
0
feat: add GGUF model format example
#70
larme
closed
2 months ago
0
fix: allow building bento on mac again
#69
larme
closed
2 months ago
0
feat: add num_scheduler_steps arg to AsyncEngine
#68
larme
opened
2 months ago
0
chore: update to vllm 0.6.0
#67
larme
closed
2 months ago
0
fix: use awq_marlin intead of awq
#66
larme
closed
2 months ago
0
docs: Specify Python versions
#65
Sherlock113
closed
2 months ago
0
fix: typo
#64
bojiang
closed
2 months ago
0
chore: fix service name typos
#63
ssheng
closed
2 months ago
0
chore: lower gpu memory utilization
#62
larme
closed
1 month ago
0
feat: add phi3-mini
#61
larme
closed
2 months ago
0
chore: upgrade to vllm==0.5.3.post1
#60
larme
closed
3 months ago
0
feat: Support llama-3.1 405B awq
#59
rickzx
closed
3 months ago
0
Add support for llama 3.1
#58
rickzx
closed
3 months ago
0
awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
#57
jackNhat
closed
1 month ago
0
fix: allow building bento on Mac/Windows
#56
larme
closed
3 months ago
0
chore: add gemma-7b-it
#55
larme
closed
4 months ago
0
chore: relax outlines integration's vllm version
#54
larme
closed
4 months ago
0
chore: upgrade to vllm 0.5.0
#53
larme
closed
4 months ago
0
Fix: llama3 bento deployment configuration
#52
larme
closed
5 months ago
0
Chore: vllm upgrade 043
#51
larme
closed
5 months ago
0
fix: down grade vllm version for outline integration
#50
larme
closed
5 months ago
0
feat: update openai decorator for every model
#49
larme
closed
5 months ago
0
fix: minor openai decoratorfix
#48
larme
closed
5 months ago
0
feat: openai endpoints decorator with default parameters
#47
larme
closed
5 months ago
0
chore: set service API concurrency to match vLLM default max_num_seqs
#46
rickzx
closed
5 months ago
0
chore: serving llama3-8b without importing model first
#45
larme
closed
5 months ago
0
fix: openai endpoints for vllm 0.4.2
#44
larme
closed
5 months ago
0
Fix compatibility issue with vllm 0.4.2. Enable prefix caching
#43
rickzx
closed
5 months ago
0
Next