THUDM / LongBench

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
MIT License
633 stars 45 forks source link

inference with kv cache #75

Closed mohammadh-cerebras closed 1 month ago

mohammadh-cerebras commented 1 month ago

added inference with kv cache optimized to load the model once, not for every task. added vllm backend for faster inference. vllm with kv cache objects is not supported yet. modified the eval script to store an overall average for tasks too. added a bash script for running the eval with different models and different sequence len. without the need for modifying scripts or configs.