Log number of KVCacheManager blocks at init

Motivation

Users are encountering problems running out of blocks on GPUs with less than 80GB memory.

Modifications

This PR simply adds a print out of the number of free blocks at start-up time.

Result

This will help us debug the issue with users, e.g., we could suggest to them to change the environment variable KV_CACHE_MANAGER_NUM_GPU_BLOCKS to manually increase the number of blocks, but we need to first know what they are starting from.

IBM / text-generation-inference

Log number of KVCacheManager blocks at init #87

Motivation

Modifications

Result

Related Issues