bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.11k stars 640 forks source link

chore(deps): bump vllm from 0.4.0 to 0.4.1 in /openllm-python #969

Closed dependabot[bot] closed 6 months ago

dependabot[bot] commented 6 months ago

Bumps vllm from 0.4.0 to 0.4.1.

Release notes

Sourced from vllm's releases.

v0.4.1

Highlights

Features

  • Support and enhance CommandR+ (#3829), minicpm (#3893), Meta Llama 3 (#4175, #4182), Mixtral 8x22b (#4073, #4002)
  • Support private model registration, and updating our support policy (#3871, 3948)
  • Support PyTorch 2.2.1 and Triton 2.2.0 (#4061, #4079, #3805, #3904, #4271)
  • Add option for using LM Format Enforcer for guided decoding (#3868)
  • Add option for optionally initialize tokenizer and detokenizer (#3748)
  • Add option for load model using tensorizer (#3476)

Enhancements

Hardwares

  • Intel CPU inference backend is added (#3993, #3634)
  • AMD backend is enhanced with Triton kernel and e4m3fn KV cache (#3643, #3290)

What's Changed

... (truncated)

Commits
  • 468d761 [Misc] Reduce supported Punica dtypes (#4304)
  • e4bf860 [CI][Build] change pynvml to nvidia-ml-py (#4302)
  • 91f50a6 [Core][Distributed] use cpu/gloo to initialize pynccl (#4248)
  • 79a268c [BUG] fixed fp8 conflict with aqlm (#4307)
  • eace8bf [Kernel] FP8 support for MoE kernel / Mixtral (#4244)
  • 1e8f425 [Bugfix][Frontend] Raise exception when file-like chat template fails to be o...
  • 2b7949c AQLM CUDA support (#3287)
  • 62b5166 [CI] Add ccache for wheel builds job (#4281)
  • d86285a [Core][Logging] Add last frame information for better debugging (#4278)
  • d87f39e [Bugfix] Add init_cached_hf_modules to RayWorkerWrapper (#4286)
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 6 months ago

Superseded by #974.