Open waterhorse1 opened 1 year ago
At this time the only way to clear the cache is to delete the Generator
object. In that case both the model and the cache will be released.
How many static prompts are you working with? Are you looking to fully clear the cache or only specific prompts?
Let me make it more clear, We are working on some tree search algorithms. So in tree search algorithms, a lot of computation can be cached because each node's preceding trajectories have already been calculated. Thus we are thinking of treating the preceding trajectories as static prompts. So finally we can have at most 500 static prompts. We are looking for fully clear the cache because the model loading also takes much time if we directly delete the generator.
I am wondering if there exists any way to manually clear the cache for static prompt for generator.generate_tokens. We are running an algorithm where a lot of computations can be saved by cache, however in our setting, we have several static prompts instead of one. By setting all of them to static prompt, we observe the gpu memory keeps going up. So can we manually clear the cache for static prompt here?