PygmalionAI aphrodite-engine issues

PygmalionAI / aphrodite-engine

Large-scale LLM inference engine

https://aphrodite.pygmalion.chat

GNU Affero General Public License v3.0

1.14k stars 127 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

feat: multi-step scheduling

#831 AlpinDale closed 13 hours ago
0
fix: unbound tokenizer error

#830 AlpinDale closed 16 hours ago
0
feat: add metrics for prefix cache hit rate

#829 AlpinDale closed 2 days ago
0
feat: add cuda sampling kernels for top_k and top_p

#828 AlpinDale closed 2 days ago
0
feat: Add DRY (Do not Repeat Yourself) sampling

#827 selalipop opened 2 days ago
10
fix: sampler test with new transformers version

#826 AlpinDale closed 2 days ago
0
feat: implement top-nsigma sampling method

#825 AlpinDale closed 2 days ago
7
SPMD optimizations

#824 AlpinDale closed 2 days ago
0
feat: support chunked prefill with LoRA

#823 AlpinDale closed 4 days ago
0
feat: add chat method for LLM class

#822 AlpinDale closed 4 days ago
0
fix: tokenization api test

#821 AlpinDale closed 4 days ago
0
[Tracker]: Passing all unit tests

#820 AlpinDale opened 4 days ago
0
build(deps): bump cross-spawn from 7.0.3 to 7.0.5 in /docs

#819 dependabot[bot] opened 4 days ago
0
fix: --max-seq-len-to-capture arg

#818 AlpinDale closed 5 days ago
0
Some fixes

#817 Naomiusearch opened 6 days ago
0
[Bug]: Argument --max-seq_len-to-capture not recognized

#816 Nero10578 closed 5 days ago
1
[Installation]: Cannot find CUDA_TOOLKIT_ROOT_DIR while trying to build for ROCm

#815 RuntimeRacer opened 1 week ago
1
fix: temperature issues

#814 50h100a closed 1 week ago
0
Mask dynatemp using min/max, rather than exp

#813 50h100a closed 1 week ago
0
[Usage]: Aphrodite Engine: KV Cache Context Length Issue with Quantized Models

#812 murtaza-nasir closed 1 week ago
1
feat: add Tencent Hunyuan model support

#811 AlpinDale opened 1 week ago
0
[Bug]: v0.6.3(.post1?) regression

#810 dirkson opened 1 week ago
0
[Bug]: 0.6.3.post1 regression: RuntimeError during mem profiling on Mistral Large AWQ with `-q awq_marlin`

#809 khanonnie opened 2 weeks ago
2
feat: update to serviceinfo v0.2

#808 AlpinDale closed 2 weeks ago
0
feat: add serviceinfo endpoint

#807 AlpinDale closed 2 weeks ago
0
[Misc]: log input and output

#806 Eve-146T opened 2 weeks ago
0
frontend: add an `ai-plugin.json` route

#805 AlpinDale closed 2 weeks ago
1
[Bug]: .\gguf_to_torch.py broken along with direct load GGUF

#804 sorasoras opened 2 weeks ago
2
frontend: enable kobold api by default

#803 AlpinDale closed 2 weeks ago
0
[Bug]: The documentation page is down and empty

#802 puppetm4st3r opened 2 weeks ago
5
ci: bump to 0.6.3.post1

#801 AlpinDale closed 2 weeks ago
0
fix: compilation of gptq_marlin_gemm object

#800 AlpinDale closed 2 weeks ago
0
ci: bump version to 0.6.3

#799 AlpinDale closed 2 weeks ago
0
feat: add TP support for bitsandbytes

#798 AlpinDale opened 2 weeks ago
0
fix: kobold lite embedded UI on windows

#797 AlpinDale closed 2 weeks ago
0
build(deps): bump rollup from 4.21.0 to 4.24.3 in /docs

#796 dependabot[bot] closed 2 weeks ago
0
feat: add HQQ quantization support

#795 AlpinDale closed 2 weeks ago
0
fix: windows wheel url

#794 AlpinDale closed 2 weeks ago
0
[Usage]: Distributed Inference Without Docker.

#793 Abdulhanan535 opened 3 weeks ago
3
[New Method]: VPTQ, Vector Post-Training Quantization

#792 YangWang92 opened 3 weeks ago
2
[Installation]: Unable to make openvino / CPU install from source work: "Failed to import from aphrodite._C with No module named 'aphrodite._C'"

#791 bolaft opened 3 weeks ago
0
feat: windows support

#790 AlpinDale closed 2 weeks ago
8
[Bug]: unable to load 14B Qwen2.5 GGUF with newest version (0.6.2.post1)

#789 NeoChen1024 opened 4 weeks ago
1
[Bug]: strange repetition issue

#788 ehartford opened 1 month ago
6
frontend: minor logging improvements

#787 AlpinDale closed 2 weeks ago
0
[Bug]: Several errors when deploying GGUF models

#786 musoles opened 1 month ago
0
Stream models rather than load them completely into RAM.

#785 50h100a closed 1 month ago
2
[Installation]: FYI: they fixed the stupid conda pytorch-cuda=12.4 / cuda 12.4.1 strict dependency issue

#784 BlairSadewitz opened 1 month ago
0
[Bug]: Impossible dependency requirement with GGUF

#783 musoles opened 1 month ago
0
[Bug]: Metrics incorrect when having zero throughput

#782 mrseeker opened 1 month ago
0