configs/newfile results in vLLM error

I made a new configs/ file

from prover.utils import AttrDict
from prover.algorithms import RMaxTS

# dataset
data_path = 'datasets/bin_addition.jsonl'
data_split = 'valid'
data_repeat = 16  # run 16 * 6400

# verifier
lean_max_concurrent_requests = 64
lean_memory_limit = 10
lean_timeout = 300

# model
batch_size = 512
model_path = 'deepseek-ai/DeepSeek-Prover-V1.5-RL'
model_args = AttrDict(
    mode='cot', # `cot` or `non-cot`
    temperature=1,
    max_tokens=2048,
    top_p=0.95,
)

# algorithm
n_search_procs = 256
sampler = dict(
    algorithm=RMaxTS,
    gamma=0.99,
    sample_num=6400,
    concurrent_num=32,
    tactic_state_comment=True,
    ckpt_interval=128,
    log_interval=32,
)

It uses the same settings as the configs/RMaxTS.py file, but I changed to data_path to point to a jsonl file with a single simple theorem. When I run python -m prover.launch --config=configs/RMaxTSbin.py --log_dir=logs/RMaxTSbin_results it results in the following error:

ValueError: The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (2112). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
INFO 08-27 11:09:45 model_runner.py:976] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 08-27 11:09:45 model_runner.py:980] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 08-27 11:09:47 model_runner.py:1057] Graph capturing finished in 2 secs.
All 2 LLMProcesses stopped

which I believe is related to https://github.com/vllm-project/vllm/issues/2418

My new config file is:

{"name": "binaryAdditionLength", "split": "valid", "informal_prefix": "/-- Let `binaryAddition` be a function that takes two lists of booleans and returns a list of booleans representing their sum in binary. Prove that the length of the output is always positive (greater than zero). -/\n", "formal_statement": "theorem binaryAdditionLength (s t : List Bool) :\n  (binaryAddition s t).length > 0 := by\n", "goal": "s t : List Bool\n⊢ (binaryAddition s t).length > 0", "header": "import Mathlib\n\nopen List\n\ndef binaryAddition (s t : List Bool) : List Bool :=\n  let rec loop : Nat → List Bool → List Bool\n    | i, acc =>\n      if i ≥ 10 then\n        acc.reverse\n      else\n        match s.get? i, t.get? i with\n        | some c, some d =>\n          let sum := (if c then 1 else 0) + (if d then 1 else 0) + (if acc.head? = some true then 1 else 0)\n          loop (i+1) ((sum % 2 = 1) :: acc)\n        | _, _ => acc.reverse\n  termination_by i _ => 10 - i\n  loop 0 []\n\n"}

deepseek-ai / DeepSeek-Prover-V1.5

configs/newfile results in vLLM error #3