Saving decode KV cache does NOT boost multi-turn conversation

E2E Test Setting
- Suppose text1 is the prompt (context + question) of request 1 and text2 is the output message of request 1. Then text1 + text2 is the context of request 2. In this logic, I ran five-turn tests. The number of tokens goes from <100 to about 30000.
Performance:
- TTFT didn't reduce TTFT1 TTFT2 TTFT3 TTFT4 TTFT5
  
  save_decode_cache: True 0.335 0.275 0.849 1.497 3.759
  
  save_decode_cache: False 0.107 0.282 0.689 1.554 3.792
- From stderr.log, retrieved number of chunks: 0 0 2 2 2
- save_decode_cache = True should retrieve more chunks
- save_decode_cache = False shouldn't retrieve chunks
- From stdout.log, last four id (after 28705) in prompt_token_ids is missing in the next turn of request

LMCache / lmcache-tests

TTFT didn't reduce		TTFT1	TTFT2	TTFT3	TTFT4	TTFT5
save_decode_cache: True	0.335	0.275	0.849	1.497	3.759
save_decode_cache: False	0.107	0.282	0.689	1.554	3.792