IBM / text-generation-inference

IBM development fork of https://github.com/huggingface/text-generation-inference
Apache License 2.0
52 stars 30 forks source link

Log number of KVCacheManager blocks at init #87

Closed tdoublep closed 4 months ago

tdoublep commented 4 months ago

Motivation

Users are encountering problems running out of blocks on GPUs with less than 80GB memory.

Modifications

This PR simply adds a print out of the number of free blocks at start-up time.

Result

This will help us debug the issue with users, e.g., we could suggest to them to change the environment variable KV_CACHE_MANAGER_NUM_GPU_BLOCKS to manually increase the number of blocks, but we need to first know what they are starting from.

Related Issues

https://huggingface.co/ibm-fms/granite-7b-lab-accelerator/discussions/1