Users are encountering problems running out of blocks on GPUs with less than 80GB memory.
Modifications
This PR simply adds a print out of the number of free blocks at start-up time.
Result
This will help us debug the issue with users, e.g., we could suggest to them to change the environment variable KV_CACHE_MANAGER_NUM_GPU_BLOCKS to manually increase the number of blocks, but we need to first know what they are starting from.
Motivation
Users are encountering problems running out of blocks on GPUs with less than 80GB memory.
Modifications
This PR simply adds a print out of the number of free blocks at start-up time.
Result
This will help us debug the issue with users, e.g., we could suggest to them to change the environment variable
KV_CACHE_MANAGER_NUM_GPU_BLOCKS
to manually increase the number of blocks, but we need to first know what they are starting from.Related Issues
https://huggingface.co/ibm-fms/granite-7b-lab-accelerator/discussions/1