NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
743 stars 48 forks source link

RULER with Mamba #41

Closed Andron00e closed 4 months ago

Andron00e commented 4 months ago

Hi! Are there any results available for State Spaces?

hsiehjackson commented 4 months ago

Hi! We have results for Jamba (hybrid base model with mamba and transformer) in our paper. We also evaluated this mamba model https://huggingface.co/state-spaces/mamba-2.8b-slimpj which doesn't train with long sequence length. If you have interests in models related to Mamba, I recommend you read this paper https://arxiv.org/pdf/2406.07887. They have results using RULER!