OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.37k stars 298 forks source link

Any way to manually clear the cache for static prompt for generator.generate_tokens? #1450

Open waterhorse1 opened 1 year ago

waterhorse1 commented 1 year ago

I am wondering if there exists any way to manually clear the cache for static prompt for generator.generate_tokens. We are running an algorithm where a lot of computations can be saved by cache, however in our setting, we have several static prompts instead of one. By setting all of them to static prompt, we observe the gpu memory keeps going up. So can we manually clear the cache for static prompt here?

guillaumekln commented 1 year ago

At this time the only way to clear the cache is to delete the Generator object. In that case both the model and the cache will be released.

How many static prompts are you working with? Are you looking to fully clear the cache or only specific prompts?

waterhorse1 commented 1 year ago

Let me make it more clear, We are working on some tree search algorithms. So in tree search algorithms, a lot of computation can be cached because each node's preceding trajectories have already been calculated. Thus we are thinking of treating the preceding trajectories as static prompts. So finally we can have at most 500 static prompts. We are looking for fully clear the cache because the model loading also takes much time if we directly delete the generator.