[Tracker]: Passing all unit tests

As of #769, our test units have been overhauled. This issue documents the progress to pass all those tests.

Current status:

Total test count: 183
Passed tests: 128
Partial fail: 23
Complete fail: 27
Untested: 5

To run a test:

pip install -r requirements-dev.txt

Then

pytest tests/your_test_module.py

Tests that currently pass will be marked with a ✅, tests that do not pass will be marked with ❌, and untested ones will be left blank (default).

[!NOTE]
A failed test does not mean that the associated feature does not work. A test may have many items (sometimes hundreds). The number of passed items will be logged for each test. Some feature tests may completely fail, but still work end-to-end.

The following features are known to be currently broken:

Llava-based Vision Model Loading
Out-Of-Tree model registration

General Tests

[x] test_cache_block_hashing
[x] test_config
[x] test_embedded_commit
[x] test_inputs
[x] test_logits_processor
[ ] test_regression ❌ (3/4) -- the vram release test fails
[x] test_sampling_params
[x] test_scalartype
[x] test_sequence
[x] test_sharded_state_loader
[x] test_utils

Async Aphrodite

[ ] test_api_server_async_aphrodite ❌
[x] test_async_aphrodite
[x] test_chat_template
[ ] test_openapi_server_ray ❌ (2/3)
[x] test_request_tracker

Basic Correctness

[x] test_basic_correctness
[ ] test_chunked_prefill ❌ (30/36)
[x] test_cpu_offload
[ ] test_preemption ❌ (3/5)

Compilation

[x] test_full_graph

Core

[x] test_block_manager
[x] test_chunked_prefill_scheduler
[x] test_scheduler_encoder_decoder
[x] test_scheduler

Distributed

[x] test_basic_distributed_correctness
[x] test_basic_distributed_correctness_enc_dec
[x] test_chunked_prefill_distributed
[x] test_comm_ops
[x] test_custom_all_reduce
[ ] test_distributed_oot ❌ (0/1)
[ ] test_multimodal_broadcast ❌ (0/6)
[ ] test_pipeline_parallel ❌ (1/10)
[ ] test_pipeline_partition ❌ (0/1)
[x] test_pp_cudagraph
[x] test_pynccl
[x] test_same_node (run with APHRODITE_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 tests/distributed/test_same_node.py)
[x] test_shm_broadcast

Endpoints

OpenAI

[ ] test_audio ❌ (1/4)
[x] test_basic
[ ] test_chat ❌ (25/33)
[ ] test_completion ❌ (77/112)
[x] test_embedding
[x] test_encoder_decoder
[x] test_guided_processors
[x] test_metrics
[x] test_models
[ ] test_mp_api_server (takes too long to run, investigate)
[ ] test_oot_registeration ❌ (0/1)
[x] test_return_tokens_as_ids
[x] test_run_batch
[x] test_serving_chat
[ ] test_shutdown ❌ (0/1)
[x] test_tokenization
[ ] test_vision ❌ (0/16) -- seems to be issues with fetching the images

LLM

[x] test_encode
[ ] test_generate_multiple_loras ❌ (0/1)
[x] test_generate
[x] test_guided_generate

Engine

[x] test_args
[x] test_computed_prefix_block
[x] test_custom_executor
[x] test_detokenization
[x] test_multiproc_workers
[x] test_skip_tokenizer_init
[ ] test_stop_reason ❌ (0/1)
[x] test_stop_string

Output Processor

[x] test_multi_step
[x] test_stop_checker

Kernels

[x] test_activation
[ ] test_attention_selector ❌ (0/22)
[x] test_attention
[x] test_blocksparse_attention
[x] test_cache
[x] test_cutlass
[x] test_encoder_decoder_attn
[x] test_flash_attn
[x] test_flashinfer
[x] test_fp8_quant
[x] test_int8_quant
[x] test_layernorm
[x] test_marlin_gemm
[x] test_moe
[x] test_pos_encoding
[x] test_prefix_prefill
[x] test_rand
[ ] test_sampler ❌ (16/197) -- triton sampler, unused

LoRA

[ ] test_baichuan ❌ (0/3) -- issues with loading the lora config file
[ ] test_chatglm3 ❌ (0/3)
[ ] test_gemma ❌ (0/3)
[x] test_layers
[x] test_llama
[x] test_long_context
[x] test_lora_checkpoints
[x] test_lora_huggingface
[x] test_lora_manager
[x] test_mixtral
[x] test_phi
[x] test_punica_sizes
[x] test_punica_variation
[x] test_quant_model
[ ] test_tokenizer_group ❌ (2/3) -- issues with the lora config file
[x] test_utils
[x] test_worker

Metrics

[ ] test_metrics ❌ (11/16)

Modeling

[ ] weight_utils

Models

[ ] test_aqlm ❌ (0/1)
[ ] test_bart ❌ (8/12)
[ ] test_big_models ❌ (2/4)
[x] test_blip2
[x] test_chameleon
[x] test_danube3_4b
[x] test_embedding
[ ] test_fp8 ❌ (0/4)
[ ] test_fuyu ❌ (0/4)
[ ] test_gguf ❌ (2/4)
[ ] test_gptq_marlin_24 ❌ (2/4)
[x] test_gptq_marlin
[ ] test_internvl ❌ (0/8)
[x] test_jamba
[ ] test_llava_image_embeds ❌ (0/3)
[ ] test_llava_next ❌ (0/4)
[ ] test_llava ❌ (0/4)
[x] test_marlin
[ ] test_minicpmv ❌ (0/8)
[x] test_mistral
[x] test_models
[ ] test_oot_registration ❌ (1/2)
[ ] test_paligemma ❌ (0/8)
[x] test_phi3v
[ ] test_qwen ❌ (0/1) -- qwen-vl
[x] test_registry

Multimodal

[x] test_mapper
[ ] test_utils ❌ (29/32) -- image fetch failure from url

Prefix Caching

[x] test_disable_sliding_window
[x] test_prefix_caching

Prompt Adapter

[x] test_bloom
[x] test_multi_adapter_inference
[x] test_pa_lora

Quantization

[x] test_bitsandbytes
[x] test_compressed_tensors
[x] test_configs
[ ] test_cpu_offload ❌ (3/4)
[x] test_experts_int8
[ ] test_fp8 ❌ (12/14)
[x] test_lm_head

Samplers

[ ] test_beam_search ❌ (0/1)
[x] test_ignore_eos
[x] test_logits_processor
[ ] test_logprobs ❌ (0/25) -- triton compile issues with chunked prefill
[x] test_ranks
[x] test_rejection_sampling
[x] test_sampler
[x] test_seeded_generate
[x] test_typical_acceptance_sampler

Spec Decode

[x] test_batch_expansion
[x] test_dynamic_spec_decode
[x] test_metrics
[ ] test_multi_step_worker ❌ (22/24)
[x] test_ngram_worker
[x] test_spec_decode_worker
[x] test_utils

End-to-end spec decode tests

[x] test_compatibilty
[ ] test_integration_dist_tp2
[ ] test_integration_dist_tp4
[x] test_integration
[x] test_logprobs
[ ] test_medusa_correctness ❌ (0/10) -- seem to be an issue with getting head_size for medusa model
[ ] test_mlp_correctness ❌ (0/13)
[ ] test_multistep_correctness ❌ (33/35)
[x] test_ngram_correctness
[x] test_seed

Tensorizer Loader

[ ] test_tensorizer ❌ (8/9)

Tokenization

[x] test_cached_tokenizer
[ ] test_detokenize ❌ (212/215)
[x] test_get_eos
[x] test_tokenizer_group
[x] test_tokenizer

Weight Loading

[x] test_weight_loading

Worker

[x] test_model_runner
[x] test_encoder_decoder_model_runner
[x] test_model_input
[x] test_swap

PygmalionAI / aphrodite-engine

[Tracker]: Passing all unit tests #820