issues
search
HabanaAI
/
vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
43
stars
58
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[HPU] Add mark_step configurable for the decoder layer
#548
jiminha
opened
2 hours ago
0
revert INC fixed version installation in requirements-hpu.txt for 1.19, add tmp one for 1.18
#547
xuechendi
opened
3 hours ago
0
Update *.sh
#546
michalkuligowski
opened
15 hours ago
0
Update *.sh
#545
michalkuligowski
opened
15 hours ago
0
Update cpu-test.yml
#544
michalkuligowski
opened
18 hours ago
0
Update cpu-test.yml
#543
michalkuligowski
opened
18 hours ago
0
1.19.0 fast-forward merge
#542
kzawora-intel
opened
19 hours ago
0
Update ray_hpu_executor.py
#541
michalkuligowski
closed
21 hours ago
0
fix marlin flag set on hpu
#540
nirda7
closed
22 hours ago
0
fix marlin flag set on hpu
#539
nirda7
closed
18 hours ago
1
[SW-201504] Trigger Internal Tests - DO NOT MERGE
#538
RonBenMosheHabana
opened
1 day ago
0
[SW-201504] Add Jenkins Tests Trigger
#537
RonBenMosheHabana
closed
3 days ago
0
Bump aiohttp from 3.10.10 to 3.10.11
#536
dependabot[bot]
opened
4 days ago
3
[bucketing overhaul 3/n] Move HPUBucketingContext to vllm-hpu-extension
#534
kdamaszk
closed
3 days ago
0
[SW-201504] Adding Test Trigger
#533
RonBenMosheHabana
closed
4 days ago
0
Limit decode block size
#532
mfylcek
closed
22 hours ago
2
Limit bucket size
#531
mfylcek
closed
4 days ago
0
[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingContext
#530
kdamaszk
closed
4 days ago
2
[BUG_FIX] 405B WARMUP failed on "FATAL ERROR :: MODULE:PT_LAZY Error, ValidateSyncInputTensors tensor_data is empty"
#529
xuechendi
closed
22 hours ago
11
Multilora regression test (compile mode) fix
#528
rsshaik1
opened
5 days ago
2
Fixed Multilora regression tests
#527
rsshaik1
closed
5 days ago
0
Skip empty steps in multi step sheduling
#526
jkaniecki
closed
5 days ago
0
[HPU] Add mark_step configurable for the decoder layer.
#525
jiminha
opened
6 days ago
1
[CI/BUILD] Spec decode ci
#524
xuechendi
opened
6 days ago
1
[BUG FIX] [SPEC DECODE] 0.6.4 rebase cause incorrectness in spec decode, fix in this PR
#523
xuechendi
opened
6 days ago
1
Update ray_hpu_executor.py
#522
michalkuligowski
closed
5 days ago
0
Set vllm-hpu-extension to a69bb99
#521
madamczykhabana
closed
6 days ago
0
Set vllm-hpu-extension to 3a60b49
#520
madamczykhabana
closed
1 week ago
0
Use contiguous pa by default
#519
madamczykhabana
closed
1 week ago
0
Clean-up LoRA flow
#518
SanjuCSudhakaran
opened
1 week ago
0
Set vllm-hpu-extension to 2542c18
#517
iboiko-habana
closed
1 week ago
0
Enable DeepseekV2 Lite/Chat models
#516
hlin99
opened
1 week ago
0
Add mark_step for baichuan
#515
YuJiankang
closed
3 days ago
1
Update .dockerignore - DO NOT MERGE - CI tetsing
#514
RonBenMoshe
opened
1 week ago
2
Test PR - Do Not Merge
#513
RonBenMoshe
closed
1 week ago
0
Test PR - Do Not Merge/Approve
#512
RonBenMosheHabana
closed
1 week ago
0
[Feature]: Models Trained on Gaudi Do Not Work
#511
gouki510
opened
1 week ago
0
[Usage]: Cannot use max_model_len greater than 8192 Tokens for llama 3.1 70B
#510
ppatel-eng
opened
1 week ago
1
Add valid_seq_lengths to fusedsdpa - port from 1.18.0
#509
iboiko-habana
closed
1 week ago
0
[BUGFIX] fix worker selector non-return issue
#508
xuechendi
closed
1 week ago
0
1.19 documentation update
#507
kzawora-intel
opened
1 week ago
0
Random sampler warmup
#506
mfylcek
closed
5 days ago
1
Terminate ray workers on ray_hpu_executor shutdown
#505
kzawora-intel
closed
1 week ago
0
Add FP8 inference procedure
#504
afierka-intel
closed
1 week ago
0
Resolved ALIBI bias regression due to porting flat PA
#503
tannervoas742
opened
1 week ago
0
[BUGFIX]fix FP8 failing issue on habana_main [PatchedVLLMKVCache fwd rror]
#502
xuechendi
closed
1 week ago
1
Warmup for multi-step scheduling
#501
tzielinski-habana
closed
1 week ago
0
Enable patching matmuls in block2batch and batch2block
#500
nirda7
closed
1 week ago
0
HPU Specific benchmarks of vLLM
#499
nageshdn
opened
1 week ago
4
Update readme
#498
michalkuligowski
closed
1 week ago
0
Next