issues
search
HabanaAI
/
vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
39
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add HPU specific changes to benchmark_latency.py
#436
kdamaszk
opened
24 minutes ago
0
Lora layers
#435
rsshaik1
opened
2 days ago
0
block_groups
#434
jmaksymczuk
closed
2 days ago
0
Contiguous PA
#433
mfylcek
opened
2 days ago
1
Revert "Contiguous PA"
#432
madamczykhabana
closed
2 days ago
0
Create scorecard.yml
#431
rozhukov
opened
2 days ago
0
Add HPU information to collect_env script
#430
michalkuligowski
opened
2 days ago
0
Add fp8 test to jenkins CI
#429
afierka-intel
opened
3 days ago
0
Set vllm-hpu-extension to 341a77f
#428
madamczykhabana
closed
3 days ago
0
Fix one_hot bug in torch compile mode
#427
yuwenzho
opened
3 days ago
0
Reduce block fragmentation
#426
yangw1234
opened
3 days ago
2
Enable Dynamic MoE for Mixtral on 1.19.0
#425
tpawlows
closed
2 days ago
0
Contiguous PA
#424
mfylcek
closed
2 days ago
3
Update README_GAUDI about fp8 calibration procedure
#423
afierka-intel
closed
3 days ago
0
fix profiler end for `prepare_input_tensor`
#422
jikunshang
opened
4 days ago
0
[WIP] GPTQ Support
#421
maktukmak
opened
4 days ago
0
Add support for various softmax normalization options
#420
madamczykhabana
closed
4 days ago
0
[Bug]: Engine loop has died
#419
warlock135
opened
4 days ago
7
Support long contexts with LoRA
#418
SanjuCSudhakaran
closed
55 minutes ago
0
[Bug]: Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
#417
pranjalst
opened
5 days ago
2
[Bug]: MQLLMEngine dies after a period of inactivity
#416
Xaenalt
opened
5 days ago
2
Change profile Run batch based on max_seq_len
#415
hlahkar
closed
5 days ago
1
Remove CPU sync before Sampler
#414
kdamaszk
closed
6 days ago
0
Remove redundant set_active_loras call during warmup
#413
SanjuCSudhakaran
closed
5 days ago
0
Remove if blocks smaller than bs in generate_decode_buckets
#412
kamil-kaczor
closed
6 days ago
0
[CI] Add torch compile tests to common definition
#411
anko-intel
closed
3 days ago
0
Add DeepSeek-V2-Lite/DeepSeek-V2-Lite-Chat model support
#410
hlin99
opened
1 week ago
0
Remove unnecessary CPU synchronization in steps < num_steps - 1
#409
jmaksymczuk
opened
1 week ago
0
Multi step scheduling
#408
tzielinski-habana
opened
1 week ago
0
[PoC] Add max padding ratio to padding aware scheduler
#407
kzawora-intel
opened
1 week ago
0
Add HPU specific arguments to benchmark_throughput
#406
kdamaszk
closed
6 days ago
0
[Bug]: `--enable-lora` raises error while trying to start api_server
#405
JHLEE17
closed
5 days ago
5
Add forward_hpu to RotaryEmbedding, remove custom module
#404
kzawora-intel
closed
6 days ago
0
[WIP] add multi step scheduling feature for HPU
#403
jikunshang
opened
1 week ago
0
Add WA for RuntimeError: "fill_cpu" not implemented for 'Float8_e4m3fn'
#402
kzawora-intel
closed
1 week ago
0
Oct 16 rebase
#401
kzawora-intel
opened
1 week ago
0
Remove HPU changes from cache_engine.py
#400
kzawora-intel
closed
1 week ago
0
Create run-lm-eval-mmlu.sh
#399
michalkuligowski
opened
1 week ago
0
WA for OOM in qwen 2 - sync after loading weights
#398
michalkuligowski
opened
1 week ago
0
prevent multiple imports of habana_frameworks
#397
hsubramony
closed
5 days ago
0
Workaround for OOM during loading llama-405
#396
afierka-intel
closed
1 week ago
0
[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingContext
#395
kzawora-intel
opened
1 week ago
1
[bucketing overhaul 1/n] Add padding-aware scheduling and option to limit prefill batch size
#394
kzawora-intel
closed
1 week ago
0
Dockerfile.hpu: set build type to release, add CFLAGS,CXXFLAGS
#393
dtrifiro
closed
1 week ago
2
[CI] Temporarily increase test tolerances
#392
kzawora-intel
closed
1 week ago
0
Add quickstart section to READMEs
#391
kzawora-intel
closed
1 week ago
0
Update SynapseAI version in README & Dockerfile
#390
kzawora-intel
closed
1 week ago
0
Reformat README_GAUDI.md
#389
kzawora-intel
closed
1 week ago
0
[CI] Prepare separate Jenkins tests for torch compile mode
#388
anko-intel
closed
1 week ago
0
Remove workaround added to resolve multi-card stall issue
#387
SanjuCSudhakaran
closed
1 week ago
0
Next