Open xuechendi opened 6 days ago
Hi, what is the total time of added tests?
tests/spec_decode/e2e/test_mlp_correctness.py::test_mlp_e2e_greedy_correctness[1-1-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] PASSED [ 50%]
tests/spec_decode/e2e/test_mlp_correctness.py::test_mlp_e2e_greedy_correctness[1-32-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] PASSED [100%]
===================================================================== warnings summary =====================================================================../../../usr/lib/python3.10/inspect.py:288
/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
return isinstance(object, types.FunctionType)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================== 2 passed, 1 warning in 57.45s ===============================================================
real 1m2.861s
user 2m55.088s
sys 0m49.803s
time VLLM_SKIP_WARMUP=True pytest -v tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness
=================================================================== test session starts ====================================================================platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm/vllm
configfile: pyproject.toml
plugins: anyio-4.6.2.post1
collected 2 items
tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness[1-1-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] PASSED [ 50%]
tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness[1-32-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] PASSED [100%]
===================================================================== warnings summary =====================================================================../../../usr/lib/python3.10/inspect.py:288
/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
return isinstance(object, types.FunctionType)
tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness[1-1-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0]
tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness[1-32-128-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0]
/workspace/vllm/vllm/vllm/model_executor/model_loader/weight_utils.py:425: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(bin_file, map_location="cpu")
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================= 2 passed, 3 warnings in 77.72s (0:01:17) =========================================================
real 1m23.139s
user 3m59.330s
sys 0m57.539s
Add spec decode CI