added inference with kv cache
optimized to load the model once, not for every task.
added vllm backend for faster inference. vllm with kv cache objects is not supported yet.
modified the eval script to store an overall average for tasks too.
added a bash script for running the eval with different models and different sequence len. without the need for modifying scripts or configs.
added inference with kv cache optimized to load the model once, not for every task. added vllm backend for faster inference. vllm with kv cache objects is not supported yet. modified the eval script to store an overall average for tasks too. added a bash script for running the eval with different models and different sequence len. without the need for modifying scripts or configs.