Closed pankajroark closed 3 months ago
At very high batch sizes the default of 0.9 is not sufficient because it doesn't leave enough gpu memory for non-kv cache use cases, we need to be able to lower it.
Testing: Tested manually on dev.
At very high batch sizes the default of 0.9 is not sufficient because it doesn't leave enough gpu memory for non-kv cache use cases, we need to be able to lower it.
Testing: Tested manually on dev.