THUDM / LongBench

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
MIT License
675 stars 54 forks source link

inference with kv cache #75

Closed mohammadh-cerebras closed 2 months ago

mohammadh-cerebras commented 2 months ago

added inference with kv cache optimized to load the model once, not for every task. added vllm backend for faster inference. vllm with kv cache objects is not supported yet. modified the eval script to store an overall average for tasks too. added a bash script for running the eval with different models and different sequence len. without the need for modifying scripts or configs.