Test results for the June sneaky update of the Phi 3 models ?

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

738 stars 47 forks source link

Test results for the June sneaky update of the Phi 3 models ? #49

Closed bhugueney closed 1 month ago

bhugueney commented 3 months ago

Thank you for this most excellent project ! In June 2024, Microsoft sneakily updated their Phi 3 model which greatly improved the context use :

RULER: a retrieval-based benchmark for long context understanding

Model | 4K | 8K | 16K | 32K | 64K | 128K | Average -- | -- | -- | -- | -- | -- | -- | -- Original | 86.7 | 78.1 | 75.6 | 70.3 | 58.9 | 43.3 | 68.8 June 2024 Update | 92.4 | 91.1 | 90.8 | 87.9 | 79.8 | 65.6 | 84.6

Would you mind having this version in your table ?

Thx.

hsiehjackson commented 3 months ago

Thanks for the information! I re-evaluate phi3-mini and put the results on our leaderboard.